Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning
Diffusion models have demonstrated highly-expressive generative capabilities in vision and
NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are …
NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are …
Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks
General-purpose robots coexisting with humans in their environment must learn to relate
human language to their perceptions and actions to be useful in a range of daily tasks …
human language to their perceptions and actions to be useful in a range of daily tasks …
How to leverage unlabeled data in offline reinforcement learning
Offline reinforcement learning (RL) can learn control policies from static datasets but, like
standard RL methods, it requires reward annotations for every transition. In many cases …
standard RL methods, it requires reward annotations for every transition. In many cases …
Pre-training for robots: Offline rl enables learning new tasks from a handful of trials
Progress in deep learning highlights the tremendous potential of utilizing diverse robotic
datasets for attaining effective generalization and makes it enticing to consider leveraging …
datasets for attaining effective generalization and makes it enticing to consider leveraging …
Hierarchical diffusion for offline decision making
Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …
horizon problem so as to address its thorny issue of variance accumulation. Problems of …
Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets
Offline reinforcement learning (RL) enables learning a decision-making policy without
interaction with the environment. This makes it particularly beneficial in situations where …
interaction with the environment. This makes it particularly beneficial in situations where …
Don't start from scratch: Leveraging prior data to automate robotic reinforcement learning
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill
acquisition for robotic systems. However, in practice, real-world robotic RL typically requires …
acquisition for robotic systems. However, in practice, real-world robotic RL typically requires …
The efficacy of pessimism in asynchronous Q-learning
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
Future-conditioned unsupervised pretraining for decision transformer
Recent research in offline reinforcement learning (RL) has demonstrated that return-
conditioned supervised learning is a powerful paradigm for decision-making problems …
conditioned supervised learning is a powerful paradigm for decision-making problems …