Supervised pretraining can learn in-context reinforcement learning

J Lee, A Xie, A Pacchiano, Y Chandak… - Advances in …, 2024 - proceedings.neurips.cc
Large transformer models trained on diverse datasets have shown a remarkable ability to
learn in-context, achieving high few-shot performance on tasks they were not explicitly …

Epistemic neural networks

I Osband, Z Wen, SM Asghari… - Advances in …, 2023 - proceedings.neurips.cc
Intelligence relies on an agent's knowledge of what it does not know. This capability can be
assessed based on the quality of joint predictions of labels across multiple inputs. In …

Self-exploring language models: Active preference elicitation for online alignment

S Zhang, D Yu, H Sharma, H Zhong, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …

Making rl with preference-based feedback efficient via randomization

R Wu, W Sun - arXiv preprint arXiv:2310.14554, 2023 - arxiv.org
Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …

Efficient exploration for llms

V Dwaracherla, SM Asghari, B Hao… - arXiv preprint arXiv …, 2024 - arxiv.org
We present evidence of substantial benefit from efficient exploration in gathering human
feedback to improve large language models. In our experiments, an agent sequentially …

Position paper: Bayesian deep learning in the age of large-scale ai

T Papamarkou, M Skoularidou, K Palla… - arXiv e …, 2024 - ui.adsabs.harvard.edu
In the current landscape of deep learning research, there is a predominant emphasis on
achieving high predictive accuracy in supervised tasks involving large image and language …

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

T Papamarkou, M Skoularidou, K Palla… - … on Machine Learning, 2024 - openreview.net
In the current landscape of deep learning research, there is a predominant emphasis on
achieving high predictive accuracy in supervised tasks involving large image and language …

Reinforcement Learning: An Overview

K Murphy - arXiv preprint arXiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Pearl: A Production-ready Reinforcement Learning Agent

Z Zhu, R de Salvo Braz, J Bhandari, D Jiang… - Journal of Machine …, 2024 - jmlr.org
Reinforcement learning (RL) is a versatile framework for optimizing long-term goals.
Although many real-world problems can be formalized with RL, learning and deploying a …

Satisficing exploration for deep reinforcement learning

D Arumugam, S Kumar, R Gummadi… - arXiv preprint arXiv …, 2024 - arxiv.org
A default assumption in the design of reinforcement-learning algorithms is that a decision-
making agent always explores to learn optimal behavior. In sufficiently complex …