Q () with off-policy corrections

T Zhang, H Mo - International Journal of Advanced Robotic …, 2021 - journals.sagepub.com

Applying the learning mechanism of natural living beings to endow intelligent robots with
humanoid perception and decision-making wisdom becomes an important force to promote …

被引用次数：105 相关文章

[PDF] hust.edu.vn

[图书][B] Algorithms for decision making

MJ Kochenderfer, TA Wheeler, KH Wray - 2022 - books.google.com

A broad introduction to algorithms for decision making under uncertainty, introducing the
underlying mathematical problem formulations and the algorithms for solving them …

被引用次数：183 相关文章所有 8 个版本

[PDF] nowpublishers.com

An introduction to deep reinforcement learning

V François-Lavet, P Henderson, R Islam… - … and Trends® in …, 2018 - nowpublishers.com

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep
learning. This field of research has been able to solve a wide range of complex …

被引用次数：1781 相关文章所有 16 个版本

[PDF] mlr.press

Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures

L Espeholt, H Soyer, R Munos… - International …, 2018 - proceedings.mlr.press

In this work we aim to solve a large collection of tasks using a single reinforcement learning
agent with a single set of parameters. A key challenge is to handle the increased amount of …

被引用次数：1639 相关文章所有 8 个版本

[PDF] e-tarjome.com

A review of machine learning for new generation smart dispatch in power systems

L Yin, Q Gao, L Zhao, B Zhang, T Wang, S Li… - … Applications of Artificial …, 2020 - Elsevier

This paper analyzes the characteristics and challenges of the new generation smart
dispatch systems, and proposes the framework of smart dispatch. Secondly, the …

被引用次数：68 相关文章所有 3 个版本

[PDF] mlr.press

A distributional perspective on reinforcement learning

MG Bellemare, W Dabney… - … conference on machine …, 2017 - proceedings.mlr.press

In this paper we argue for the fundamental importance of the value distribution: the
distribution of the random return received by a reinforcement learning agent. This is in …

被引用次数：1793 相关文章所有 8 个版本

[PDF] jair.org

Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents

MC Machado, MG Bellemare, E Talvitie… - Journal of Artificial …, 2018 - jair.org

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge
of building AI agents with general competency across dozens of Atari 2600 games. It …

被引用次数：625 相关文章所有 14 个版本

[PDF] arxiv.org

Sample efficient actor-critic with experience replay

Z Wang, V Bapst, N Heess, V Mnih, R Munos… - arXiv preprint arXiv …, 2016 - arxiv.org

This paper presents an actor-critic deep reinforcement learning agent with experience
replay that is stable, sample efficient, and performs remarkably well on challenging …

被引用次数：1002 相关文章所有 6 个版本

[PDF] neurips.cc

Safe and efficient off-policy reinforcement learning

R Munos, T Stepleton… - Advances in neural …, 2016 - proceedings.neurips.cc

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based
reinforcement learning. Expressing these in a common form, we derive a novel algorithm …

被引用次数：724 相关文章所有 10 个版本

[PDF] neurips.cc

Meta-gradient reinforcement learning

Z Xu, HP van Hasselt, D Silver - Advances in neural …, 2018 - proceedings.neurips.cc

The goal of reinforcement learning algorithms is to estimate and/or optimise the value
function. However, unlike supervised learning, no teacher or oracle is available to provide …

被引用次数：358 相关文章所有 7 个版本