Extreme q-learning: Maxent rl without entropy

P Hansen-Estruch, I Kostrikov, M Janner… - arXiv preprint arXiv …, 2023 - arxiv.org

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …

被引用次数：110 相关文章所有 4 个版本

[PDF] neurips.cc

Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2024 - proceedings.neurips.cc

Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

被引用次数：37 相关文章所有 6 个版本

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：42 相关文章所有 9 个版本

[PDF] arxiv.org

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

被引用次数：55 相关文章所有 5 个版本

[PDF] neurips.cc

For sale: State-action representation learning for deep reinforcement learning

S Fujimoto, WD Chang, E Smith… - Advances in …, 2024 - proceedings.neurips.cc

In reinforcement learning (RL), representation learning is a proven tool for complex image-
based tasks, but is often overlooked for environments with low-level states, such as physical …

被引用次数：52 相关文章所有 5 个版本

[PDF] neurips.cc

Double gumbel q-learning

DYT Hui, AC Courville… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise
sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q …

被引用次数：8 相关文章所有 3 个版本

[PDF] mlr.press

Maximum entropy GFlowNets with soft Q-learning

S Mohammadpour, E Bengio… - International …, 2024 - proceedings.mlr.press

Abstract Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling
discrete objects from unnormalized distributions, offering a scalable alternative to Markov …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Consistency models as a rich and efficient policy class for reinforcement learning

Z Ding, C Jin - arXiv preprint arXiv:2309.16984, 2023 - arxiv.org

Score-based generative models like the diffusion model have been testified to be effective in
modeling multi-modal data from image generation to reinforcement learning (RL). However …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization

K Lei, Z He, C Lu, K Hu, Y Gao, H Xu - arXiv preprint arXiv:2311.03351, 2023 - arxiv.org

Combining offline and online reinforcement learning (RL) is crucial for efficient and safe
learning. However, previous approaches treat offline and online learning as separate …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Proto: Iterative policy regularized offline-to-online reinforcement learning

J Li, X Hu, H Xu, J Liu, X Zhan, YQ Zhang - arXiv preprint arXiv …, 2023 - arxiv.org

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining
and online finetuning, promises enhanced sample efficiency and policy performance …

被引用次数：17 相关文章所有 4 个版本