Learning continuous control policies by stochastic value gradients

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

被引用次数：1741 相关文章所有 3 个版本

Path planning and obstacle avoidance for AUV: A review

C Cheng, Q Sha, B He, G Li - Ocean Engineering, 2021 - Elsevier

Autonomous underwater vehicle plays a more and more important role in the exploration of
marine resources. Path planning and obstacle avoidance is the core technology to realize …

被引用次数：164 相关文章所有 3 个版本

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

被引用次数：620 相关文章所有 9 个版本

[PDF] jsdelivr.net

[PDF][PDF] 深度强化学习综述

刘全，翟建伟，章宗长，钟珊，周倩，章鹏，徐进 - 计算机学报, 2018 - cdn.jsdelivr.net

:强化学习是学习环境状态到动作的一种映射,并且能够获得最大的奖赏信号.在大规模状 Page 1
第40 卷计算机学报 Vol. 40 2017 年论文在线出版号No.1 CHINESE JOURNAL OF …

被引用次数：104 相关文章所有 6 个版本

[PDF] arxiv.org

Mastering atari, go, chess and shogi by planning with a learned model

J Schrittwieser, I Antonoglou, T Hubert, K Simonyan… - Nature, 2020 - nature.com

Constructing agents with planning capabilities has long been one of the main challenges in
the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge …

被引用次数：2330 相关文章所有 15 个版本

[PDF] nowpublishers.com

[引用][C] An introduction to variational autoencoders

DP Kingma, M Welling - Foundations and Trends® in …, 2019 - nowpublishers.com

An Introduction to Variational Autoencoders Page 1 An Introduction to Variational Autoencoders
Page 2 Other titles in Foundations and Trends R in Machine Learning Computational Optimal …

被引用次数：2651 相关文章所有 11 个版本

[PDF] neurips.cc

When to trust your model: Model-based policy optimization

M Janner, J Fu, M Zhang… - Advances in neural …, 2019 - proceedings.neurips.cc

Designing effective model-based reinforcement learning algorithms is difficult because the
ease of data generation must be weighed against the bias of model-generated data. In this …

被引用次数：930 相关文章所有 10 个版本

[PDF] neurips.cc

Critic regularized regression

Z Wang, A Novikov, K Zolna, JS Merel… - Advances in …, 2020 - proceedings.neurips.cc

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy
optimization from large pre-recorded datasets without online environment interaction. It …

被引用次数：301 相关文章所有 9 个版本

[PDF] arxiv.org

Soft actor-critic algorithms and applications

T Haarnoja, A Zhou, K Hartikainen, G Tucker… - arXiv preprint arXiv …, 2018 - arxiv.org

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a
range of challenging sequential decision making and control tasks. However, these methods …

被引用次数：2587 相关文章所有 4 个版本

[PDF] arxiv.org

Model-based reinforcement learning for atari

L Kaiser, M Babaeizadeh, P Milos, B Osinski… - arXiv preprint arXiv …, 2019 - arxiv.org

Model-free reinforcement learning (RL) can be used to learn effective policies for complex
tasks, such as Atari games, even from image observations. However, this typically requires …

被引用次数：917 相关文章所有 6 个版本