Efficient diffusion policies for offline reinforcement learning

B Kang, X Ma, C Du, T Pang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …

Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning

S Wang, Q Yang, J Gao, M Lin… - Advances in …, 2024 - proceedings.neurips.cc
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-
training on a pre-collected dataset with fine-tuning in an online environment. However, the …

Understanding, predicting and better resolving Q-value divergence in offline-RL

Y Yue, R Lu, B Kang, S Song… - Advances in Neural …, 2024 - proceedings.neurips.cc
The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …

Boosting offline reinforcement learning with action preference query

Q Yang, S Wang, MG Lin, S Song… - … on Machine Learning, 2023 - proceedings.mlr.press
Training practical agents usually involve offline and online reinforcement learning (RL) to
balance the policy's performance and interaction costs. In particular, online fine-tuning has …

Score regularized policy optimization through diffusion behavior

H Chen, C Lu, Z Wang, H Su, J Zhu - arXiv preprint arXiv:2310.07297, 2023 - arxiv.org
Recent developments in offline reinforcement learning have uncovered the immense
potential of diffusion modeling, which excels at representing heterogeneous behavior …

Exploring Text-to-Motion Generation with Human Preference

J Sheng, M Lin, A Zhao, K Pruvost… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper presents an exploration of preference learning in text-to-motion generation. We
find that current improvements in text-to-motion generation still rely on datasets requiring …

Model-based trajectory stitching for improved behavioural cloning and its applications

CA Hepburn, G Montana - Machine Learning, 2024 - Springer
Behavioural cloning (BC) is a commonly used imitation learning method to infer a sequential
decision-making policy from expert demonstrations. However, when the quality of the data is …

[PDF][PDF] A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning

J Liu, Y Ma, J Hao, Y Hu, Y Zheng, T Lv… - Proceedings of the 23rd …, 2024 - ifaamas.org
In recent years, offline reinforcement learning (RL) algorithms have gained considerable
attention. However, the role of data sampling techniques in offline RL has been somewhat …

Federated ensemble-directed offline reinforcement learning

D Rengarajan, N Ragothaman, D Kalathil… - arXiv preprint arXiv …, 2023 - arxiv.org
We consider the problem of federated offline reinforcement learning (RL), a scenario under
which distributed learning agents must collaboratively learn a high-quality control policy only …

Using offline data to speed-up reinforcement learning in procedurally generated environments

A Andres, L Schäfer, E Villar-Rodriguez… - arXiv preprint arXiv …, 2023 - arxiv.org
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to
generalise their learned policy to unseen settings. Moreover, training RL agents requires …