Offline rl policies should be trained to be adaptive

A Ajay, Y Du, A Gupta, J Tenenbaum… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

被引用次数：330 相关文章所有 4 个版本

[PDF] neurips.cc

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

被引用次数：112 相关文章所有 5 个版本

[PDF] mlr.press

Hierarchical diffusion for offline decision making

W Li, X Wang, B Jin, H Zha - International Conference on …, 2023 - proceedings.mlr.press

Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …

被引用次数：25 相关文章所有 5 个版本

[PDF] neurips.cc

Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning

S Wang, Q Yang, J Gao, M Lin… - Advances in …, 2024 - proceedings.neurips.cc

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-
training on a pre-collected dataset with fine-tuning in an online environment. However, the …

被引用次数：10 相关文章所有 6 个版本

[PDF] neurips.cc

VOCE: Variational optimization with conservative estimation for offline safe reinforcement learning

J Guan, G Chen, J Ji, L Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Offline safe reinforcement learning (RL) algorithms promise to learn policies that satisfy
safety constraints directly in offline datasets without interacting with the environment. This …

被引用次数：10 相关文章所有 4 个版本

[PDF] neurips.cc

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

被引用次数：10 相关文章所有 5 个版本

[PDF] mlr.press

Semi-supervised offline reinforcement learning with action-free trajectories

Q Zheng, M Henaff, B Amos… - … conference on machine …, 2023 - proceedings.mlr.press

Natural agents can effectively learn from multiple data sources that differ in size, quality, and
types of measurements. We study this heterogeneity in the context of offline reinforcement …

被引用次数：20 相关文章所有 8 个版本

[PDF] mlr.press

Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press

Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

被引用次数：24 相关文章所有 6 个版本

[PDF] arxiv.org

Policy expansion for bridging offline-to-online reinforcement learning

H Zhang, W Xu, H Yu - arXiv preprint arXiv:2302.00935, 2023 - arxiv.org

Pre-training with offline data and online fine-tuning using reinforcement learning is a
promising strategy for learning control policies by leveraging the best of both worlds in terms …

被引用次数：56 相关文章所有 3 个版本

Online tree-based planning for active spacecraft fault estimation and collision avoidance

J Ragan, B Riviere, FY Hadaegh, SJ Chung - Science Robotics, 2024 - science.org

Autonomous robots operating in uncertain or hazardous environments subject to state safety
constraints must be able to identify and isolate faulty components in a time-optimal manner …

被引用次数：1 相关文章所有 3 个版本