Combating the compounding-error problem with a multi-step model

FM Luo, T Xu, H Lai, XH Chen, W Zhang… - Science China Information …, 2024 - Springer

Reinforcement learning (RL) interacts with the environment to solve sequential decision-
making problems via a trial-and-error approach. Errors are always undesirable in real-world …

被引用次数：102 相关文章所有 4 个版本

[PDF] neurips.cc

Mastering atari games with limited data

W Ye, S Liu, T Kurutach, P Abbeel… - Advances in neural …, 2021 - proceedings.neurips.cc

Reinforcement learning has achieved great success in many applications. However, sample
efficiency remains a key challenge, with prominent methods requiring millions (or even …

被引用次数：249 相关文章所有 7 个版本

Exploring chemical reaction space with machine learning models: Representation and feature perspective

Y Ding, B Qiang, Q Chen, Y Liu… - Journal of Chemical …, 2024 - ACS Publications

Chemical reactions serve as foundational building blocks for organic chemistry and drug
design. In the era of large AI models, data-driven approaches have emerged to innovate the …

被引用次数：12 相关文章所有 3 个版本

[PDF] mlr.press

Bidirectional model-based policy optimization

H Lai, J Shen, W Zhang, Y Yu - International Conference on …, 2020 - proceedings.mlr.press

Abstract Model-based reinforcement learning approaches leverage a forward dynamics
model to support planning and decision making, which, however, may fail catastrophically if …

被引用次数：65 相关文章所有 8 个版本

[PDF] neurips.cc

Mismatched no more: Joint model-policy optimization for model-based rl

B Eysenbach, A Khazatsky, S Levine… - Advances in …, 2022 - proceedings.neurips.cc

Many model-based reinforcement learning (RL) methods follow a similar template: fit a
model to previously observed data, and then use data from that model for RL or planning …

被引用次数：48 相关文章所有 8 个版本

[PDF] nature.com

Learning continuous models for continuous physics

AS Krishnapriyan, AF Queiruga, NB Erichson… - Communications …, 2023 - nature.com

Dynamical systems that evolve continuously over time are ubiquitous throughout science
and engineering. Machine learning (ML) provides data-driven approaches to model and …

被引用次数：34 相关文章所有 10 个版本

[PDF] neurips.cc

How to fine-tune the model: Unified model shift and model bias policy optimization

H Zhang, H Yu, J Zhao, D Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms
with a performance improvement guarantee is challenging, mainly attributed to the high …

被引用次数：4 相关文章所有 5 个版本

[PDF] mlr.press

Live in the moment: Learning dynamics model adapted to evolving policy

X Wang, W Wongkamjan, R Jia… - … on Machine Learning, 2023 - proceedings.mlr.press

Abstract Model-based reinforcement learning (RL) often achieves higher sample efficiency
in practice than model-free RL by learning a dynamics model to generate samples for policy …

被引用次数：17 相关文章所有 8 个版本

[PDF] neurips.cc

gamma-models: Generative temporal difference learning for infinite-horizon prediction

M Janner, I Mordatch, S Levine - Advances in Neural …, 2020 - proceedings.neurips.cc

We introduce the gamma-model, a predictive model of environment dynamics with an
infinite, probabilistic horizon. Replacing standard single-step models with gamma-models …

被引用次数：46 相关文章所有 3 个版本

[PDF] arxiv.org

Investigating compounding prediction errors in learned dynamics models

N Lambert, K Pister, R Calandra - arXiv preprint arXiv:2203.09637, 2022 - arxiv.org

Accurately predicting the consequences of agents' actions is a key prerequisite for planning
in robotic control. Model-based reinforcement learning (MBRL) is one paradigm which relies …

被引用次数：32 相关文章所有 2 个版本