Adapting behaviour for learning progress

J Parker-Holder, A Pacchiano… - Advances in …, 2020 - proceedings.neurips.cc

Exploration is a key problem in reinforcement learning, since agents can only learn from
data they acquire in the environment. With that in mind, maintaining a population of agents is …

被引用次数：166 相关文章所有 8 个版本

[PDF] neurips.cc

A self-tuning actor-critic algorithm

T Zahavy, Z Xu, V Veeriah, M Hessel… - Advances in neural …, 2020 - proceedings.neurips.cc

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters,
typically requiring significant manual effort to identify hyperparameters that perform well on a …

被引用次数：90 相关文章所有 6 个版本

[PDF] neurips.cc

Tactical optimism and pessimism for deep reinforcement learning

T Moskovitz, J Parker-Holder… - Advances in …, 2021 - proceedings.neurips.cc

In recent years, deep off-policy actor-critic algorithms have become a dominant approach to
reinforcement learning for continuous control. One of the primary drivers of this improved …

被引用次数：57 相关文章所有 15 个版本

[PDF] neurips.cc

The phenomenon of policy churn

T Schaul, A Barreto, J Quan… - Advances in Neural …, 2022 - proceedings.neurips.cc

We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …

被引用次数：25 相关文章所有 5 个版本

[PDF] jair.org Full View

Automated reinforcement learning (autorl): A survey and open problems

J Parker-Holder, R Rajan, X Song, A Biedenkapp… - Journal of Artificial …, 2022 - jair.org

Abstract The combination of Reinforcement Learning (RL) with deep learning has led to a
series of impressive feats, with many believing (deep) RL provides a path towards generally …

被引用次数：95 相关文章所有 10 个版本

[PDF] aaai.org

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

T Phan, T Huang, B Dilkina, S Koenig - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Anytime multi-agent path finding (MAPF) is a promising approach to scalable path
optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

When do drivers concentrate? Attention-based driver behavior modeling with deep reinforcement learning

X Fu, F Gao, J Wu - arXiv preprint arXiv:2002.11385, 2020 - arxiv.org

Driver distraction a significant risk to driving safety. Apart from spatial domain, research on
temporal inattention is also necessary. This paper aims to figure out the pattern of drivers' …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Temporal difference uncertainties as a signal for exploration

S Flennerhag, JX Wang, P Sprechmann, F Visin… - arXiv preprint arXiv …, 2020 - arxiv.org

An effective approach to exploration in reinforcement learning is to rely on an agent's
uncertainty over the optimal policy, which can yield near-optimal exploration strategies in …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities

M Wulfmeier, A Byravan, S Bechtle, K Hausman… - arXiv preprint arXiv …, 2023 - arxiv.org

Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by
the growth of required resources, expansive datasets and corresponding investments into …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

T Phan, B Zhang, SH Chan, S Koenig - arXiv preprint arXiv:2408.02960, 2024 - arxiv.org

Anytime multi-agent path finding (MAPF) is a promising approach to scalable path
optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search …

被引用次数：1 相关文章所有 3 个版本