Model-based policy optimization with unsupervised model adaptation

FM Luo, T Xu, H Lai, XH Chen, W Zhang… - Science China Information …, 2024 - Springer

Reinforcement learning (RL) interacts with the environment to solve sequential decision-
making problems via a trial-and-error approach. Errors are always undesirable in real-world …

被引用次数：102 相关文章所有 4 个版本

[PDF] mdpi.com

A review of deep reinforcement learning approaches for smart manufacturing in industry 4.0 and 5.0 framework

A del Real Torres, DS Andreiana, Á Ojeda Roldán… - Applied Sciences, 2022 - mdpi.com

In this review, the industry's current issues regarding intelligent manufacture are presented.
This work presents the status and the potential for the I4. 0 and I5. 0's revolutionary …

被引用次数：54 相关文章所有 5 个版本

[PDF] mlr.press

Cooperative exploration for multi-agent deep reinforcement learning

IJ Liu, U Jain, RA Yeh… - … conference on machine …, 2021 - proceedings.mlr.press

Exploration is critical for good results in deep reinforcement learning and has attracted much
attention. However, existing multi-agent deep reinforcement learning algorithms still use …

被引用次数：119 相关文章所有 9 个版本

[PDF] arxiv.org

Dropout q-functions for doubly efficient reinforcement learning

T Hiraoka, T Imagawa, T Hashimoto, T Onishi… - arXiv preprint arXiv …, 2021 - arxiv.org

Randomized ensembled double Q-learning (REDQ)(Chen et al., 2021b) has recently
achieved state-of-the-art sample efficiency on continuous-action reinforcement learning …

被引用次数：110 相关文章所有 4 个版本

[PDF] neurips.cc

Cross-domain policy adaptation via value-guided data filtering

K Xu, C Bai, X Ma, D Wang, B Zhao… - Advances in …, 2023 - proceedings.neurips.cc

Generalizing policies across different domains with dynamics mismatch poses a significant
challenge in reinforcement learning. For example, a robot learns the policy in a simulator …

被引用次数：13 相关文章所有 5 个版本

[PDF] mlr.press

Live in the moment: Learning dynamics model adapted to evolving policy

X Wang, W Wongkamjan, R Jia… - … on Machine Learning, 2023 - proceedings.mlr.press

Abstract Model-based reinforcement learning (RL) often achieves higher sample efficiency
in practice than model-free RL by learning a dynamics model to generate samples for policy …

被引用次数：17 相关文章所有 8 个版本

[PDF] neurips.cc

Weighted model estimation for offline model-based reinforcement learning

T Hishinuma, K Senda - Advances in neural information …, 2021 - proceedings.neurips.cc

This paper discusses model estimation in offline model-based reinforcement learning
(MBRL), which is important for subsequent policy improvement using an estimated model …

被引用次数：15 相关文章所有 7 个版本

[PDF] arxiv.org

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

X Wang, R Zheng, Y Sun, R Jia, W Wongkamjan… - arXiv preprint arXiv …, 2023 - arxiv.org

Dyna-style model-based reinforcement learning contains two phases: model rollouts to
generate sample for policy learning and real environment exploration using current policy …

被引用次数：6 相关文章所有 5 个版本

[PDF] jmlr.org

Adaptation augmented model-based policy optimization

J Shen, H Lai, M Liu, H Zhao, Y Yu, W Zhang - Journal of Machine …, 2023 - jmlr.org

Compared to model-free reinforcement learning (RL), model-based RL is often more sample
efficient by leveraging a learned dynamics model to help decision making. However, the …

被引用次数：2 相关文章

[PDF] neurips.cc

On effective scheduling of model-based reinforcement learning

H Lai, J Shen, W Zhang, Y Huang… - Advances in …, 2021 - proceedings.neurips.cc

Abstract Model-based reinforcement learning has attracted wide attention due to its superior
sample efficiency. Despite its impressive success so far, it is still unclear how to …

被引用次数：14 相关文章所有 6 个版本