Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning

H He, C Bai, K Xu, Z Yang, W Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc
Diffusion models have demonstrated highly-expressive generative capabilities in vision and
NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are …

Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks

O Mees, L Hermann, E Rosete-Beas… - IEEE Robotics and …, 2022 - ieeexplore.ieee.org
General-purpose robots coexisting with humans in their environment must learn to relate
human language to their perceptions and actions to be useful in a range of daily tasks …

How to leverage unlabeled data in offline reinforcement learning

T Yu, A Kumar, Y Chebotar… - International …, 2022 - proceedings.mlr.press
Offline reinforcement learning (RL) can learn control policies from static datasets but, like
standard RL methods, it requires reward annotations for every transition. In many cases …

Pre-training for robots: Offline rl enables learning new tasks from a handful of trials

A Kumar, A Singh, F Ebert, M Nakamoto… - arXiv preprint arXiv …, 2022 - arxiv.org
Progress in deep learning highlights the tremendous potential of utilizing diverse robotic
datasets for attaining effective generalization and makes it enticing to consider leveraging …

Hierarchical diffusion for offline decision making

W Li, X Wang, B Jin, H Zha - International Conference on …, 2023 - proceedings.mlr.press
Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …

Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets

ZW Hong, A Kumar, S Karnik… - Advances in …, 2023 - proceedings.neurips.cc
Offline reinforcement learning (RL) enables learning a decision-making policy without
interaction with the environment. This makes it particularly beneficial in situations where …

Don't start from scratch: Leveraging prior data to automate robotic reinforcement learning

HR Walke, JH Yang, A Yu, A Kumar… - … on Robot Learning, 2023 - proceedings.mlr.press
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill
acquisition for robotic systems. However, in practice, real-world robotic RL typically requires …

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

Future-conditioned unsupervised pretraining for decision transformer

Z Xie, Z Lin, D Ye, Q Fu, Y Wei… - … Conference on Machine …, 2023 - proceedings.mlr.press
Recent research in offline reinforcement learning (RL) has demonstrated that return-
conditioned supervised learning is a powerful paradigm for decision-making problems …