How to fine-tune the model: Unified model shift and model bias policy optimization

H Zhang, H Yu, J Zhao, D Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Designing and deriving effective model-based reinforcement learning (MBRL) algorithms
with a performance improvement guarantee is challenging, mainly attributed to the high …

Seizing serendipity: Exploiting the value of past success in off-policy actor-critic

T Ji, Y Luo, F Sun, X Zhan, J Zhang, H Xu - arXiv preprint arXiv …, 2023 - arxiv.org
Learning high-quality $ Q $-value functions plays a key role in the success of many modern
off-policy deep reinforcement learning (RL) algorithms. Previous works primarily focus on …

Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

K Dong, Y Luo, Y Wang, Y Liu, C Qu, Q Zhang… - Knowledge-Based …, 2024 - Elsevier
Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated
superior sample efficiency compared to their model-free counterparts, largely attributable to …

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Y Luo, T Ji, F Sun, J Zhang, H Xu, X Zhan - arXiv preprint arXiv …, 2024 - arxiv.org
Off-policy reinforcement learning (RL) has achieved notable success in tackling many
complex real-world tasks, by leveraging previously collected data for policy learning …

Understanding world models through multi-step pruning policy via reinforcement learning

Z He, W Qiu, W Zhao, X Shao, Z Liu - Information Sciences, 2025 - Elsevier
In model-based reinforcement learning, the conventional approach to addressing world
model bias is to use gradient optimization methods. However, using a singular policy from …

Model-Based Reinforcement Learning with Isolated Imaginations

M Pan, X Zhu, Y Zheng, Y Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
World models learn the consequences of actions in vision-based interactive systems.
However, in practical scenarios like autonomous driving, noncontrollable dynamics that are …

Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning

H Zhang, B Zheng, A Guo, T Ji, PA Heng… - arXiv preprint arXiv …, 2024 - arxiv.org
Offline meta reinforcement learning (OMRL) has emerged as a promising approach for
interaction avoidance and strong generalization performance by leveraging pre-collected …

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Y Luo, T Ji, F Sun, J Zhang, H Xu, X Zhan - arXiv preprint arXiv …, 2024 - arxiv.org
Training reinforcement learning policies using environment interaction data collected from
varying policies or dynamics presents a fundamental challenge. Existing works often …

Trust the Model Where It Trusts Itself--Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption

B Frauenknecht, A Eisele, D Subhasish… - arXiv preprint arXiv …, 2024 - arxiv.org
Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with
predictive transition models through model-based rollouts. This combination raises a critical …

A model of how hierarchical representations constructed in the hippocampus are used to navigate through space

E Chalmers, M Bardal, R McDonald… - Adaptive …, 2024 - journals.sagepub.com
Animals can navigate through complex environments with amazing flexibility and efficiency:
they forage over large areas, quickly learning rewarding behavior and changing their plans …