How to fine-tune the model: Unified model shift and model bias policy optimization
Designing and deriving effective model-based reinforcement learning (MBRL) algorithms
with a performance improvement guarantee is challenging, mainly attributed to the high …
with a performance improvement guarantee is challenging, mainly attributed to the high …
Query-policy misalignment in preference-based reinforcement learning
Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents'
behavior with human desired outcomes, but is often restrained by costly human feedback …
behavior with human desired outcomes, but is often restrained by costly human feedback …
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
Dyna-style model-based reinforcement learning contains two phases: model rollouts to
generate sample for policy learning and real environment exploration using current policy …
generate sample for policy learning and real environment exploration using current policy …
Understanding world models through multi-step pruning policy via reinforcement learning
In model-based reinforcement learning, the conventional approach to addressing world
model bias is to use gradient optimization methods. However, using a singular policy from …
model bias is to use gradient optimization methods. However, using a singular policy from …
Learning policy-aware models for model-based reinforcement learning via transition occupancy matching
Standard model-based reinforcement learning (MBRL) approaches fit a transition model of
the environment to all past experience, but this wastes model capacity on data that is …
the environment to all past experience, but this wastes model capacity on data that is …
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient,
adaptive, and explainable by learning an explicit model of the environment. While the …
adaptive, and explainable by learning an explicit model of the environment. While the …
The primacy bias in Model-based RL
The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency
to overfit early data and lose the ability to learn from new data, can significantly decrease the …
to overfit early data and lose the ability to learn from new data, can significantly decrease the …
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning
Abstract Model-based reinforcement learning (RL) has demonstrated remarkable successes
on a range of continuous control tasks due to its high sample efficiency. To save the …
on a range of continuous control tasks due to its high sample efficiency. To save the …
Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation
Large language models (LLMs) have significantly advanced various natural language
processing tasks, but deploying them remains computationally expensive. Knowledge …
processing tasks, but deploying them remains computationally expensive. Knowledge …
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Monte Carlo Tree Search (MCTS) has recently emerged as a powerful technique for
enhancing the reasoning capabilities of LLMs. Techniques such as SFT or DPO have …
enhancing the reasoning capabilities of LLMs. Techniques such as SFT or DPO have …