Direct language model alignment from online ai feedback

S Guo, B Zhang, T Liu, T Liu, M Khalman… - arXiv preprint arXiv …, 2024 - arxiv.org
Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as
efficient alternatives to reinforcement learning from human feedback (RLHF), that do not …

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arXiv preprint arXiv …, 2023 - arxiv.org
Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

Learning goal-conditioned policies offline with self-supervised reward shaping

L Mezghani, S Sukhbaatar… - … on robot learning, 2023 - proceedings.mlr.press
Developing agents that can execute multiple skills by learning from pre-collected datasets is
an important problem in robotics, where online interaction with the environment is extremely …

Curious exploration via structured world models yields zero-shot object manipulation

C Sancaktar, S Blaes, G Martius - Advances in Neural …, 2022 - proceedings.neurips.cc
It has been a long-standing dream to design artificial agents that explore their environment
efficiently via intrinsic motivation, similar to how children perform curious free play. Despite …

Towards robust offline-to-online reinforcement learning via uncertainty and smoothness

X Wen, X Yu, R Yang, C Bai, Z Wang - arXiv preprint arXiv:2309.16973, 2023 - arxiv.org
To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a
promising approach involves the combination of offline RL, which enhances sample …

Learning general world models in a handful of reward-free deployments

Y Xu, J Parker-Holder, A Pacchiano… - Advances in …, 2022 - proceedings.neurips.cc
Building generally capable agents is a grand challenge for deep reinforcement learning
(RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate …

Investigating the role of model-based learning in exploration and transfer

JC Walker, E Vértes, Y Li… - International …, 2023 - proceedings.mlr.press
State of the art reinforcement learning has enabled training agents on tasks of ever
increasing complexity. However, the current paradigm tends to favor training agents from …

Think before you act: Unified policy for interleaving language reasoning with actions

L Mezghani, P Bojanowski, K Alahari… - arXiv preprint arXiv …, 2023 - arxiv.org
The success of transformer models trained with a language modeling objective brings a
promising opportunity to the reinforcement learning framework. Decision Transformer is a …

Reliable conditioning of behavioral cloning for offline reinforcement learning

T Nguyen, Q Zheng, A Grover - arXiv preprint arXiv:2210.05158, 2022 - arxiv.org
Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline
trajectories via supervised learning. Recent advances (Chen et al., 2021; Janner et al …

Mastering stacking of diverse shapes with large-scale iterative reinforcement learning on real robots

T Lampe, A Abdolmaleki, S Bechtle… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Reinforcement learning solely from an agent's self-generated data is often believed to be
infeasible for learning on real robots, due to the amount of data needed. However, if done …