Direct language model alignment from online ai feedback
Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as
efficient alternatives to reinforcement learning from human feedback (RLHF), that do not …
efficient alternatives to reinforcement learning from human feedback (RLHF), that do not …
Motif: Intrinsic motivation from artificial intelligence feedback
Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …
immensely challenging. In this paper, we propose Motif, a general method to interface such …
Learning goal-conditioned policies offline with self-supervised reward shaping
L Mezghani, S Sukhbaatar… - … on robot learning, 2023 - proceedings.mlr.press
Developing agents that can execute multiple skills by learning from pre-collected datasets is
an important problem in robotics, where online interaction with the environment is extremely …
an important problem in robotics, where online interaction with the environment is extremely …
Curious exploration via structured world models yields zero-shot object manipulation
It has been a long-standing dream to design artificial agents that explore their environment
efficiently via intrinsic motivation, similar to how children perform curious free play. Despite …
efficiently via intrinsic motivation, similar to how children perform curious free play. Despite …
Towards robust offline-to-online reinforcement learning via uncertainty and smoothness
To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a
promising approach involves the combination of offline RL, which enhances sample …
promising approach involves the combination of offline RL, which enhances sample …
Learning general world models in a handful of reward-free deployments
Building generally capable agents is a grand challenge for deep reinforcement learning
(RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate …
(RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate …
Investigating the role of model-based learning in exploration and transfer
State of the art reinforcement learning has enabled training agents on tasks of ever
increasing complexity. However, the current paradigm tends to favor training agents from …
increasing complexity. However, the current paradigm tends to favor training agents from …
Think before you act: Unified policy for interleaving language reasoning with actions
The success of transformer models trained with a language modeling objective brings a
promising opportunity to the reinforcement learning framework. Decision Transformer is a …
promising opportunity to the reinforcement learning framework. Decision Transformer is a …
Reliable conditioning of behavioral cloning for offline reinforcement learning
Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline
trajectories via supervised learning. Recent advances (Chen et al., 2021; Janner et al …
trajectories via supervised learning. Recent advances (Chen et al., 2021; Janner et al …
Mastering stacking of diverse shapes with large-scale iterative reinforcement learning on real robots
Reinforcement learning solely from an agent's self-generated data is often believed to be
infeasible for learning on real robots, due to the amount of data needed. However, if done …
infeasible for learning on real robots, due to the amount of data needed. However, if done …