Boosting offline reinforcement learning via data rebalancing

B Kang, X Ma, C Du, T Pang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …

被引用次数：31 相关文章所有 5 个版本

[PDF] neurips.cc

Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning

S Wang, Q Yang, J Gao, M Lin… - Advances in …, 2024 - proceedings.neurips.cc

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-
training on a pre-collected dataset with fine-tuning in an online environment. However, the …

被引用次数：7 相关文章所有 6 个版本

[PDF] neurips.cc

Understanding, predicting and better resolving Q-value divergence in offline-RL

Y Yue, R Lu, B Kang, S Song… - Advances in Neural …, 2024 - proceedings.neurips.cc

The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …

被引用次数：3 相关文章所有 5 个版本

[PDF] mlr.press

Boosting offline reinforcement learning with action preference query

Q Yang, S Wang, MG Lin, S Song… - … on Machine Learning, 2023 - proceedings.mlr.press

Training practical agents usually involve offline and online reinforcement learning (RL) to
balance the policy's performance and interaction costs. In particular, online fine-tuning has …

被引用次数：8 相关文章所有 7 个版本

[PDF] arxiv.org

Score regularized policy optimization through diffusion behavior

H Chen, C Lu, Z Wang, H Su, J Zhu - arXiv preprint arXiv:2310.07297, 2023 - arxiv.org

Recent developments in offline reinforcement learning have uncovered the immense
potential of diffusion modeling, which excels at representing heterogeneous behavior …

被引用次数：7 相关文章所有 3 个版本

[PDF] thecvf.com

Exploring Text-to-Motion Generation with Human Preference

J Sheng, M Lin, A Zhao, K Pruvost… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper presents an exploration of preference learning in text-to-motion generation. We
find that current improvements in text-to-motion generation still rely on datasets requiring …

Model-based trajectory stitching for improved behavioural cloning and its applications

CA Hepburn, G Montana - Machine Learning, 2024 - Springer

Behavioural cloning (BC) is a commonly used imitation learning method to infer a sequential
decision-making policy from expert demonstrations. However, when the quality of the data is …

被引用次数：1 相关文章所有 3 个版本

[PDF] ifaamas.org

[PDF][PDF] A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning

J Liu, Y Ma, J Hao, Y Hu, Y Zheng, T Lv… - Proceedings of the 23rd …, 2024 - ifaamas.org

In recent years, offline reinforcement learning (RL) algorithms have gained considerable
attention. However, the role of data sampling techniques in offline RL has been somewhat …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Federated ensemble-directed offline reinforcement learning

D Rengarajan, N Ragothaman, D Kalathil… - arXiv preprint arXiv …, 2023 - arxiv.org

We consider the problem of federated offline reinforcement learning (RL), a scenario under
which distributed learning agents must collaboratively learn a high-quality control policy only …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Using offline data to speed-up reinforcement learning in procedurally generated environments

A Andres, L Schäfer, E Villar-Rodriguez… - arXiv preprint arXiv …, 2023 - arxiv.org

One of the key challenges of Reinforcement Learning (RL) is the ability of agents to
generalise their learned policy to unseen settings. Moreover, training RL agents requires …

被引用次数：3 相关文章所有 4 个版本