Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning

G Xudong, F Dawei, K Xu, Y Zhai, C Yao… - … on Machine Learning, 2024 - openreview.net

Imitation learning heavily relies on the quality of provided demonstrations. In scenarios
where demonstrations are imperfect and rare, a prevalent approach for refining policies is …

[PDF] arxiv.org

OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

S Yue, X Hua, J Ren, S Lin, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation
policy from static demonstration data, followed by fast finetuning with minimal environmental …

Advantage-Aware Policy Optimization for Offline Reinforcement Learning

Y Qing, J Cong, K Chen, Y Zhou, M Song - arXiv preprint arXiv …, 2024 - arxiv.org

Offline Reinforcement Learning (RL) endeavors to leverage offline datasets to craft effective
agent policy without online interaction, which imposes proper conservative constraints with …

Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

XH Liu, TS Liu, S Jiang, R Chen, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Combining offline and online reinforcement learning (RL) techniques is indeed crucial for
achieving efficient and safe learning where data acquisition is expensive. Existing methods …

[PDF] ieeecai.org

[PDF][PDF] Data-Driven Reinforcement Learning for Optimal Motor Control in Washing Machines

C Kang, G Bae, D Kim, K Lee, D Son, C Lee, J Lee… - ieeecai.org

In this paper, we address the challenge of developing advanced motor control systems for
modern washing machines, which are required to operate under various conditions …

[PDF] uic.edu

[PDF][PDF] Poisoning Offline Reinforcement Learning to Promote Distributional Shift in Online Finetuning

Z Yu, S Kang, X Zhang - cs.uic.edu

Offline-to-online reinforcement learning has recently been shown effective in reducing the
online sample complexity by first training from offline collected data. However, this additional …