Mopo: Model-based offline policy optimization

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：221 相关文章所有 9 个版本

Reinforcement learning algorithms: A brief survey

AK Shakya, G Pillai, S Chakrabarty - Expert Systems with Applications, 2023 - Elsevier

Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential
decision-making in complex problems. RL is inspired by trial-and-error based human/animal …

被引用次数：76 相关文章所有 2 个版本

[PDF] arxiv.org

Planning with diffusion for flexible behavior synthesis

M Janner, Y Du, JB Tenenbaum, S Levine - arXiv preprint arXiv …, 2022 - arxiv.org

Model-based reinforcement learning methods often use learning only for the purpose of
estimating an approximate dynamics model, offloading the rest of the decision-making work …

被引用次数：358 相关文章所有 4 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3276 相关文章所有 2 个版本

[PDF] neurips.cc

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

被引用次数：1316 相关文章所有 11 个版本

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

被引用次数：620 相关文章所有 9 个版本

[PDF] neurips.cc

Model-based imitation learning for urban driving

A Hu, G Corrado, N Griffiths, Z Murez… - Advances in …, 2022 - proceedings.neurips.cc

An accurate model of the environment and the dynamic agents acting in it offers great
potential for improving motion planning. We present MILE: a Model-based Imitation …

被引用次数：93 相关文章所有 11 个版本

[PDF] neurips.cc

Uncertainty-based offline reinforcement learning with diversified q-ensemble

G An, S Moon, JH Kim… - Advances in neural …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a
previously collected static dataset, bears algorithmic difficulties due to function …

被引用次数：232 相关文章所有 7 个版本

[PDF] arxiv.org

Mastering atari with discrete world models

D Hafner, T Lillicrap, M Norouzi, J Ba - arXiv preprint arXiv:2010.02193, 2020 - arxiv.org

Intelligent agents need to generalize from past experience to achieve goals in complex
environments. World models facilitate such generalization and allow learning behaviors …

被引用次数：736 相关文章所有 7 个版本

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：386 相关文章所有 7 个版本