Idql: Implicit q-learning as an actor-critic method with diffusion policies
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …
learning (IQL) addresses this by training a Q-function using only dataset actions through a …
Hiql: Offline goal-conditioned rl with latent states as actions
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …
Inverse preference learning: Preference-based rl without a reward function
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …
based Reinforcement Learning (RL) algorithms address these problems by learning reward …
Contrastive prefence learning: Learning from human feedback without rl
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …
For sale: State-action representation learning for deep reinforcement learning
In reinforcement learning (RL), representation learning is a proven tool for complex image-
based tasks, but is often overlooked for environments with low-level states, such as physical …
based tasks, but is often overlooked for environments with low-level states, such as physical …
Double gumbel q-learning
DYT Hui, AC Courville… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise
sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q …
sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q …
Maximum entropy GFlowNets with soft Q-learning
S Mohammadpour, E Bengio… - International …, 2024 - proceedings.mlr.press
Abstract Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling
discrete objects from unnormalized distributions, offering a scalable alternative to Markov …
discrete objects from unnormalized distributions, offering a scalable alternative to Markov …
Consistency models as a rich and efficient policy class for reinforcement learning
Score-based generative models like the diffusion model have been testified to be effective in
modeling multi-modal data from image generation to reinforcement learning (RL). However …
modeling multi-modal data from image generation to reinforcement learning (RL). However …
Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe
learning. However, previous approaches treat offline and online learning as separate …
learning. However, previous approaches treat offline and online learning as separate …
Proto: Iterative policy regularized offline-to-online reinforcement learning
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining
and online finetuning, promises enhanced sample efficiency and policy performance …
and online finetuning, promises enhanced sample efficiency and policy performance …