Effective diversity in population based reinforcement learning

J Parker-Holder, A Pacchiano… - Advances in …, 2020 - proceedings.neurips.cc
Exploration is a key problem in reinforcement learning, since agents can only learn from
data they acquire in the environment. With that in mind, maintaining a population of agents is …

A self-tuning actor-critic algorithm

T Zahavy, Z Xu, V Veeriah, M Hessel… - Advances in neural …, 2020 - proceedings.neurips.cc
Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters,
typically requiring significant manual effort to identify hyperparameters that perform well on a …

Tactical optimism and pessimism for deep reinforcement learning

T Moskovitz, J Parker-Holder… - Advances in …, 2021 - proceedings.neurips.cc
In recent years, deep off-policy actor-critic algorithms have become a dominant approach to
reinforcement learning for continuous control. One of the primary drivers of this improved …

The phenomenon of policy churn

T Schaul, A Barreto, J Quan… - Advances in Neural …, 2022 - proceedings.neurips.cc
We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …

Automated reinforcement learning (autorl): A survey and open problems

J Parker-Holder, R Rajan, X Song, A Biedenkapp… - Journal of Artificial …, 2022 - jair.org
Abstract The combination of Reinforcement Learning (RL) with deep learning has led to a
series of impressive feats, with many believing (deep) RL provides a path towards generally …

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

T Phan, T Huang, B Dilkina, S Koenig - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Anytime multi-agent path finding (MAPF) is a promising approach to scalable path
optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on …

When do drivers concentrate? Attention-based driver behavior modeling with deep reinforcement learning

X Fu, F Gao, J Wu - arXiv preprint arXiv:2002.11385, 2020 - arxiv.org
Driver distraction a significant risk to driving safety. Apart from spatial domain, research on
temporal inattention is also necessary. This paper aims to figure out the pattern of drivers' …

Temporal difference uncertainties as a signal for exploration

S Flennerhag, JX Wang, P Sprechmann, F Visin… - arXiv preprint arXiv …, 2020 - arxiv.org
An effective approach to exploration in reinforcement learning is to rely on an agent's
uncertainty over the optimal policy, which can yield near-optimal exploration strategies in …

Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities

M Wulfmeier, A Byravan, S Bechtle, K Hausman… - arXiv preprint arXiv …, 2023 - arxiv.org
Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by
the growth of required resources, expansive datasets and corresponding investments into …

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

T Phan, B Zhang, SH Chan, S Koenig - arXiv preprint arXiv:2408.02960, 2024 - arxiv.org
Anytime multi-agent path finding (MAPF) is a promising approach to scalable path
optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search …