The challenges of exploration for offline reinforcement learning

S Guo, B Zhang, T Liu, T Liu, M Khalman… - arXiv preprint arXiv …, 2024 - arxiv.org

Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as
efficient alternatives to reinforcement learning from human feedback (RLHF), that do not …

被引用次数：60 相关文章所有 2 个版本

[PDF] arxiv.org

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arXiv preprint arXiv …, 2023 - arxiv.org

Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

被引用次数：30 相关文章所有 6 个版本

[PDF] mlr.press

Learning goal-conditioned policies offline with self-supervised reward shaping

L Mezghani, S Sukhbaatar… - … on robot learning, 2023 - proceedings.mlr.press

Developing agents that can execute multiple skills by learning from pre-collected datasets is
an important problem in robotics, where online interaction with the environment is extremely …

被引用次数：16 相关文章所有 11 个版本

[PDF] neurips.cc

Curious exploration via structured world models yields zero-shot object manipulation

C Sancaktar, S Blaes, G Martius - Advances in Neural …, 2022 - proceedings.neurips.cc

It has been a long-standing dream to design artificial agents that explore their environment
efficiently via intrinsic motivation, similar to how children perform curious free play. Despite …

被引用次数：23 相关文章所有 7 个版本

[PDF] arxiv.org

Towards robust offline-to-online reinforcement learning via uncertainty and smoothness

X Wen, X Yu, R Yang, C Bai, Z Wang - arXiv preprint arXiv:2309.16973, 2023 - arxiv.org

To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a
promising approach involves the combination of offline RL, which enhances sample …

被引用次数：7 相关文章所有 2 个版本

[PDF] neurips.cc

Learning general world models in a handful of reward-free deployments

Y Xu, J Parker-Holder, A Pacchiano… - Advances in …, 2022 - proceedings.neurips.cc

Building generally capable agents is a grand challenge for deep reinforcement learning
(RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate …

被引用次数：8 相关文章所有 7 个版本

[PDF] mlr.press

Investigating the role of model-based learning in exploration and transfer

JC Walker, E Vértes, Y Li… - International …, 2023 - proceedings.mlr.press

State of the art reinforcement learning has enabled training agents on tasks of ever
increasing complexity. However, the current paradigm tends to favor training agents from …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Think before you act: Unified policy for interleaving language reasoning with actions

L Mezghani, P Bojanowski, K Alahari… - arXiv preprint arXiv …, 2023 - arxiv.org

The success of transformer models trained with a language modeling objective brings a
promising opportunity to the reinforcement learning framework. Decision Transformer is a …

被引用次数：10 相关文章所有 8 个版本

[PDF] arxiv.org

Reliable conditioning of behavioral cloning for offline reinforcement learning

T Nguyen, Q Zheng, A Grover - arXiv preprint arXiv:2210.05158, 2022 - arxiv.org

Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline
trajectories via supervised learning. Recent advances (Chen et al., 2021; Janner et al …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Mastering stacking of diverse shapes with large-scale iterative reinforcement learning on real robots

T Lampe, A Abdolmaleki, S Bechtle… - … on Robotics and …, 2024 - ieeexplore.ieee.org

Reinforcement learning solely from an agent's self-generated data is often believed to be
infeasible for learning on real robots, due to the amount of data needed. However, if done …

被引用次数：1 相关文章所有 2 个版本