Live in the moment: Learning dynamics model adapted to evolving policy

H Zhang, H Yu, J Zhao, D Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms
with a performance improvement guarantee is challenging, mainly attributed to the high …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Query-policy misalignment in preference-based reinforcement learning

X Hu, J Li, X Zhan, QS Jia, YQ Zhang - arXiv preprint arXiv:2305.17400, 2023 - arxiv.org

Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents'
behavior with human desired outcomes, but is often restrained by costly human feedback …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

X Wang, R Zheng, Y Sun, R Jia, W Wongkamjan… - arXiv preprint arXiv …, 2023 - arxiv.org

Dyna-style model-based reinforcement learning contains two phases: model rollouts to
generate sample for policy learning and real environment exploration using current policy …

被引用次数：6 相关文章所有 5 个版本

Understanding world models through multi-step pruning policy via reinforcement learning

Z He, W Qiu, W Zhao, X Shao, Z Liu - Information Sciences, 2025 - Elsevier

In model-based reinforcement learning, the conventional approach to addressing world
model bias is to use gradient optimization methods. However, using a singular policy from …

被引用次数：1 相关文章所有 2 个版本

[PDF] mlr.press

Learning policy-aware models for model-based reinforcement learning via transition occupancy matching

YJ Ma, K Sivakumar, J Yan, O Bastani… - … for Dynamics and …, 2023 - proceedings.mlr.press

Standard model-based reinforcement learning (MBRL) approaches fit a transition model of
the environment to all past experience, but this wastes model capacity on data that is …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

R Wei, N Lambert, A McDonald, A Garcia… - arXiv preprint arXiv …, 2023 - arxiv.org

Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient,
adaptive, and explainable by learning an explicit model of the environment. While the …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

The primacy bias in Model-based RL

Z Qiao, J Lyu, X Li - arXiv preprint arXiv:2310.15017, 2023 - arxiv.org

The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency
to overfit early data and lose the ability to learn from new data, can significantly decrease the …

被引用次数：3 相关文章所有 2 个版本

[PDF] iospress.nl

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

C Li, R Jia, J Liu, Y Zhang, Y Niu, Y Yang, Y Liu… - ECAI 2023, 2023 - ebooks.iospress.nl

Abstract Model-based reinforcement learning (RL) has demonstrated remarkable successes
on a range of continuous control tasks due to its high sample efficiency. To save the …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Y Zhou, J Zhu, P Xu, X Liu, X Wang, D Koutra… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have significantly advanced various natural language
processing tasks, but deploying them remains computationally expensive. Knowledge …

被引用次数：1 相关文章

[PDF] arxiv.org

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

X Wang, L Song, Y Tian, D Yu, B Peng, H Mi… - arXiv preprint arXiv …, 2024 - arxiv.org

Monte Carlo Tree Search (MCTS) has recently emerged as a powerful technique for
enhancing the reasoning capabilities of LLMs. Techniques such as SFT or DPO have …