Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe
learning. However, previous approaches treat offline and online learning as separate …
learning. However, previous approaches treat offline and online learning as separate …
Simple Ingredients for Offline Reinforcement Learning
Offline reinforcement learning algorithms have proven effective on datasets highly
connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which …
connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which …
OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning
Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using
a pre-collected dataset. Most current methods struggle with the mismatch between imperfect …
a pre-collected dataset. Most current methods struggle with the mismatch between imperfect …
Advantage-Aware Policy Optimization for Offline Reinforcement Learning
Offline Reinforcement Learning (RL) endeavors to leverage offline datasets to craft effective
agent policy without online interaction, which imposes proper conservative constraints with …
agent policy without online interaction, which imposes proper conservative constraints with …
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced.
To address this, existing methods often constrain the learned policy through policy …
To address this, existing methods often constrain the learned policy through policy …
Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments
Training intelligent agents to navigate highly interactive environments presents significant
challenges. While guided meta reinforcement learning (RL) approach that first trains a …
challenges. While guided meta reinforcement learning (RL) approach that first trains a …
Offline Fictitious Self-Play for Competitive Games
Offline Reinforcement Learning (RL) has received significant interest due to its ability to
improve policies in previously collected datasets without online interactions. Despite its …
improve policies in previously collected datasets without online interactions. Despite its …
Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback
Offline policy learning methods in batch learning aim to derive a policy from a logged bandit
feedback dataset, encompassing context, action, propensity score and feedback for each …
feedback dataset, encompassing context, action, propensity score and feedback for each …
Enhancing Offline Reinforcement Learning with an Optimal Supported Dataset
C Chen, Z Xu, Y Mao, H Zhang, X Ji - openreview.net
Offline Reinforcement Learning (Offline RL) is challenged by distributional shift and value
overestimation, which often leads to poor performance. To address this issue, a popular …
overestimation, which often leads to poor performance. To address this issue, a popular …