Idql: Implicit q-learning as an actor-critic method with diffusion policies

P Hansen-Estruch, I Kostrikov, M Janner… - arXiv preprint arXiv …, 2023 - arxiv.org
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …

Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2024 - proceedings.neurips.cc
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

For sale: State-action representation learning for deep reinforcement learning

S Fujimoto, WD Chang, E Smith… - Advances in …, 2024 - proceedings.neurips.cc
In reinforcement learning (RL), representation learning is a proven tool for complex image-
based tasks, but is often overlooked for environments with low-level states, such as physical …

Double gumbel q-learning

DYT Hui, AC Courville… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise
sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q …

Maximum entropy GFlowNets with soft Q-learning

S Mohammadpour, E Bengio… - International …, 2024 - proceedings.mlr.press
Abstract Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling
discrete objects from unnormalized distributions, offering a scalable alternative to Markov …

Consistency models as a rich and efficient policy class for reinforcement learning

Z Ding, C Jin - arXiv preprint arXiv:2309.16984, 2023 - arxiv.org
Score-based generative models like the diffusion model have been testified to be effective in
modeling multi-modal data from image generation to reinforcement learning (RL). However …

Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization

K Lei, Z He, C Lu, K Hu, Y Gao, H Xu - arXiv preprint arXiv:2311.03351, 2023 - arxiv.org
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe
learning. However, previous approaches treat offline and online learning as separate …

Proto: Iterative policy regularized offline-to-online reinforcement learning

J Li, X Hu, H Xu, J Liu, X Zhan, YQ Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining
and online finetuning, promises enhanced sample efficiency and policy performance …