Is conditional generative modeling all you need for decision-making?
Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …
high-quality images from language descriptions alone. We investigate whether these …
Mildly conservative q-learning for offline reinforcement learning
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …
without continually interacting with the environment. The distribution shift between the …
Hierarchical diffusion for offline decision making
Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …
horizon problem so as to address its thorny issue of variance accumulation. Problems of …
Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-
training on a pre-collected dataset with fine-tuning in an online environment. However, the …
training on a pre-collected dataset with fine-tuning in an online environment. However, the …
VOCE: Variational optimization with conservative estimation for offline safe reinforcement learning
Offline safe reinforcement learning (RL) algorithms promise to learn policies that satisfy
safety constraints directly in offline datasets without interacting with the environment. This …
safety constraints directly in offline datasets without interacting with the environment. This …
Design from policies: Conservative test-time adaptation for offline policy optimization
In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …
Semi-supervised offline reinforcement learning with action-free trajectories
Natural agents can effectively learn from multiple data sources that differ in size, quality, and
types of measurements. We study this heterogeneity in the context of offline reinforcement …
types of measurements. We study this heterogeneity in the context of offline reinforcement …
Anti-exploration by random network distillation
Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …
Policy expansion for bridging offline-to-online reinforcement learning
Pre-training with offline data and online fine-tuning using reinforcement learning is a
promising strategy for learning control policies by leveraging the best of both worlds in terms …
promising strategy for learning control policies by leveraging the best of both worlds in terms …
Online tree-based planning for active spacecraft fault estimation and collision avoidance
Autonomous robots operating in uncertain or hazardous environments subject to state safety
constraints must be able to identify and isolate faulty components in a time-optimal manner …
constraints must be able to identify and isolate faulty components in a time-optimal manner …