Model-free episodic control

C Blundell, B Uria, A Pritzel, Y Li, A Ruderman… - arXiv preprint arXiv …, 2016 - arxiv.org
State of the art deep reinforcement learning algorithms take many millions of interactions to
attain human-level performance. Humans, on the other hand, can very quickly exploit highly …

Generalizable episodic memory for deep reinforcement learning

H Hu, J Ye, G Zhu, Z Ren, C Zhang - arXiv preprint arXiv:2103.06469, 2021 - arxiv.org
Episodic memory-based methods can rapidly latch onto past successful strategies by a non-
parametric memory and improve sample efficiency of traditional reinforcement learning …

Neural episodic control

A Pritzel, B Uria, S Srinivasan… - International …, 2017 - proceedings.mlr.press
Deep reinforcement learning methods attain super-human performance in a wide range of
environments. Such methods are grossly inefficient, often taking orders of magnitudes more …

A unifying view of optimism in episodic reinforcement learning

G Neu, C Pike-Burke - Advances in Neural Information …, 2020 - proceedings.neurips.cc
The principle of``optimism in the face of uncertainty''underpins many theoretically successful
reinforcement learning algorithms. In this paper we provide a general framework for …

Episodic memory deep q-networks

Z Lin, T Zhao, G Yang, L Zhang - arXiv preprint arXiv:1805.07603, 2018 - arxiv.org
Reinforcement learning (RL) algorithms have made huge progress in recent years by
leveraging the power of deep neural networks (DNN). Despite the success, deep RL …

Exploration via elliptical episodic bonuses

M Henaff, R Raileanu, M Jiang… - Advances in Neural …, 2022 - proceedings.neurips.cc
In recent years, a number of reinforcement learning (RL) methods have been pro-posed to
explore complex environments which differ across episodes. In this work, we show that the …

Recall traces: Backtracking models for efficient reinforcement learning

A Goyal, P Brakel, W Fedus, S Singhal… - arXiv preprint arXiv …, 2018 - arxiv.org
In many environments only a tiny subset of all states yield high reward. In these cases, few of
the interactions with the environment provide a relevant learning signal. Hence, we may …

Deep reinforcement learning amidst lifelong non-stationarity

A Xie, J Harrison, C Finn - arXiv preprint arXiv:2006.10701, 2020 - arxiv.org
As humans, our goals and our environment are persistently changing throughout our lifetime
based on our experiences, actions, and internal and external drives. In contrast, typical …

Learning to reinforcement learn

JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer… - arXiv preprint arXiv …, 2016 - arxiv.org
In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …

Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds

A Zanette, E Brunskill - International Conference on Machine …, 2019 - proceedings.mlr.press
Strong worst-case performance bounds for episodic reinforcement learning exist but
fortunately in practice RL algorithms perform much better than such bounds would predict …