Efficient diffusion policies for offline reinforcement learning
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …
Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-
training on a pre-collected dataset with fine-tuning in an online environment. However, the …
training on a pre-collected dataset with fine-tuning in an online environment. However, the …
Understanding, predicting and better resolving Q-value divergence in offline-RL
The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …
Boosting offline reinforcement learning with action preference query
Training practical agents usually involve offline and online reinforcement learning (RL) to
balance the policy's performance and interaction costs. In particular, online fine-tuning has …
balance the policy's performance and interaction costs. In particular, online fine-tuning has …
Score regularized policy optimization through diffusion behavior
Recent developments in offline reinforcement learning have uncovered the immense
potential of diffusion modeling, which excels at representing heterogeneous behavior …
potential of diffusion modeling, which excels at representing heterogeneous behavior …
Exploring Text-to-Motion Generation with Human Preference
This paper presents an exploration of preference learning in text-to-motion generation. We
find that current improvements in text-to-motion generation still rely on datasets requiring …
find that current improvements in text-to-motion generation still rely on datasets requiring …
Model-based trajectory stitching for improved behavioural cloning and its applications
CA Hepburn, G Montana - Machine Learning, 2024 - Springer
Behavioural cloning (BC) is a commonly used imitation learning method to infer a sequential
decision-making policy from expert demonstrations. However, when the quality of the data is …
decision-making policy from expert demonstrations. However, when the quality of the data is …
[PDF][PDF] A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning
In recent years, offline reinforcement learning (RL) algorithms have gained considerable
attention. However, the role of data sampling techniques in offline RL has been somewhat …
attention. However, the role of data sampling techniques in offline RL has been somewhat …
Federated ensemble-directed offline reinforcement learning
We consider the problem of federated offline reinforcement learning (RL), a scenario under
which distributed learning agents must collaboratively learn a high-quality control policy only …
which distributed learning agents must collaboratively learn a high-quality control policy only …
Using offline data to speed-up reinforcement learning in procedurally generated environments
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to
generalise their learned policy to unseen settings. Moreover, training RL agents requires …
generalise their learned policy to unseen settings. Moreover, training RL agents requires …