Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization

K Lei, Z He, C Lu, K Hu, Y Gao, H Xu - arXiv preprint arXiv:2311.03351, 2023 - arxiv.org
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe
learning. However, previous approaches treat offline and online learning as separate …

Simple Ingredients for Offline Reinforcement Learning

E Cetin, A Tirinzoni, M Pirotta, A Lazaric… - arXiv preprint arXiv …, 2024 - arxiv.org
Offline reinforcement learning algorithms have proven effective on datasets highly
connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which …

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

Y Yao, Z Cen, W Ding, H Lin, S Liu, T Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using
a pre-collected dataset. Most current methods struggle with the mismatch between imperfect …

Advantage-Aware Policy Optimization for Offline Reinforcement Learning

Y Qing, J Cong, K Chen, Y Zhou, M Song - arXiv preprint arXiv …, 2024 - arxiv.org
Offline Reinforcement Learning (RL) endeavors to leverage offline datasets to craft effective
agent policy without online interaction, which imposes proper conservative constraints with …

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

T Liu, Y Li, Y Lan, H Gao, W Pan, X Xu - arXiv preprint arXiv:2405.19909, 2024 - arxiv.org
In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced.
To address this, existing methods often constrain the learned policy through policy …

Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments

M Arief, M Timmerman, J Li, D Isele… - arXiv preprint arXiv …, 2024 - arxiv.org
Training intelligent agents to navigate highly interactive environments presents significant
challenges. While guided meta reinforcement learning (RL) approach that first trains a …

Offline Fictitious Self-Play for Competitive Games

J Chen, W Xie, W Zhang, Y Wen - arXiv preprint arXiv:2403.00841, 2024 - arxiv.org
Offline Reinforcement Learning (RL) has received significant interest due to its ability to
improve policies in previously collected datasets without online interactions. Despite its …

Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback

A Behnamnia, G Aminian, A Aghaei, C Shi… - ICML 2024 Workshop … - openreview.net
Offline policy learning methods in batch learning aim to derive a policy from a logged bandit
feedback dataset, encompassing context, action, propensity score and feedback for each …

Enhancing Offline Reinforcement Learning with an Optimal Supported Dataset

C Chen, Z Xu, Y Mao, H Zhang, X Ji - openreview.net
Offline Reinforcement Learning (Offline RL) is challenged by distributional shift and value
overestimation, which often leads to poor performance. To address this issue, a popular …