相关文章- 学术资源搜索

Model-free episodic control

C Blundell, B Uria, A Pritzel, Y Li, A Ruderman… - arXiv preprint arXiv …, 2016 - arxiv.org

State of the art deep reinforcement learning algorithms take many millions of interactions to
attain human-level performance. Humans, on the other hand, can very quickly exploit highly …

被引用次数：274 相关文章所有 4 个版本

[PDF] arxiv.org

Generalizable episodic memory for deep reinforcement learning

H Hu, J Ye, G Zhu, Z Ren, C Zhang - arXiv preprint arXiv:2103.06469, 2021 - arxiv.org

Episodic memory-based methods can rapidly latch onto past successful strategies by a non-
parametric memory and improve sample efficiency of traditional reinforcement learning …

被引用次数：43 相关文章所有 5 个版本

[PDF] mlr.press

Neural episodic control

A Pritzel, B Uria, S Srinivasan… - International …, 2017 - proceedings.mlr.press

Deep reinforcement learning methods attain super-human performance in a wide range of
environments. Such methods are grossly inefficient, often taking orders of magnitudes more …

被引用次数：420 相关文章所有 5 个版本

[PDF] neurips.cc

A unifying view of optimism in episodic reinforcement learning

G Neu, C Pike-Burke - Advances in Neural Information …, 2020 - proceedings.neurips.cc

The principle of``optimism in the face of uncertainty''underpins many theoretically successful
reinforcement learning algorithms. In this paper we provide a general framework for …

被引用次数：76 相关文章所有 11 个版本

[PDF] arxiv.org

Episodic memory deep q-networks

Z Lin, T Zhao, G Yang, L Zhang - arXiv preprint arXiv:1805.07603, 2018 - arxiv.org

Reinforcement learning (RL) algorithms have made huge progress in recent years by
leveraging the power of deep neural networks (DNN). Despite the success, deep RL …

被引用次数：103 相关文章所有 4 个版本

[PDF] neurips.cc

Exploration via elliptical episodic bonuses

M Henaff, R Raileanu, M Jiang… - Advances in Neural …, 2022 - proceedings.neurips.cc

In recent years, a number of reinforcement learning (RL) methods have been pro-posed to
explore complex environments which differ across episodes. In this work, we show that the …

被引用次数：30 相关文章所有 6 个版本

[PDF] arxiv.org

Recall traces: Backtracking models for efficient reinforcement learning

A Goyal, P Brakel, W Fedus, S Singhal… - arXiv preprint arXiv …, 2018 - arxiv.org

In many environments only a tiny subset of all states yield high reward. In these cases, few of
the interactions with the environment provide a relevant learning signal. Hence, we may …

被引用次数：79 相关文章所有 6 个版本

[PDF] arxiv.org

Deep reinforcement learning amidst lifelong non-stationarity

A Xie, J Harrison, C Finn - arXiv preprint arXiv:2006.10701, 2020 - arxiv.org

As humans, our goals and our environment are persistently changing throughout our lifetime
based on our experiences, actions, and internal and external drives. In contrast, typical …

被引用次数：69 相关文章所有 3 个版本

[PDF] ucl.ac.uk

Learning to reinforcement learn

JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer… - arXiv preprint arXiv …, 2016 - arxiv.org

In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …

被引用次数：1063 相关文章所有 8 个版本

[PDF] mlr.press

Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds

A Zanette, E Brunskill - International Conference on Machine …, 2019 - proceedings.mlr.press

Strong worst-case performance bounds for episodic reinforcement learning exist but
fortunately in practice RL algorithms perform much better than such bounds would predict …

被引用次数：305 相关文章所有 8 个版本