Effective diversity in population based reinforcement learning
J Parker-Holder, A Pacchiano… - Advances in …, 2020 - proceedings.neurips.cc
Exploration is a key problem in reinforcement learning, since agents can only learn from
data they acquire in the environment. With that in mind, maintaining a population of agents is …
data they acquire in the environment. With that in mind, maintaining a population of agents is …
A self-tuning actor-critic algorithm
Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters,
typically requiring significant manual effort to identify hyperparameters that perform well on a …
typically requiring significant manual effort to identify hyperparameters that perform well on a …
Tactical optimism and pessimism for deep reinforcement learning
T Moskovitz, J Parker-Holder… - Advances in …, 2021 - proceedings.neurips.cc
In recent years, deep off-policy actor-critic algorithms have become a dominant approach to
reinforcement learning for continuous control. One of the primary drivers of this improved …
reinforcement learning for continuous control. One of the primary drivers of this improved …
The phenomenon of policy churn
We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …
Automated reinforcement learning (autorl): A survey and open problems
Abstract The combination of Reinforcement Learning (RL) with deep learning has led to a
series of impressive feats, with many believing (deep) RL provides a path towards generally …
series of impressive feats, with many believing (deep) RL provides a path towards generally …
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search
Anytime multi-agent path finding (MAPF) is a promising approach to scalable path
optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on …
optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on …
When do drivers concentrate? Attention-based driver behavior modeling with deep reinforcement learning
X Fu, F Gao, J Wu - arXiv preprint arXiv:2002.11385, 2020 - arxiv.org
Driver distraction a significant risk to driving safety. Apart from spatial domain, research on
temporal inattention is also necessary. This paper aims to figure out the pattern of drivers' …
temporal inattention is also necessary. This paper aims to figure out the pattern of drivers' …
Temporal difference uncertainties as a signal for exploration
An effective approach to exploration in reinforcement learning is to rely on an agent's
uncertainty over the optimal policy, which can yield near-optimal exploration strategies in …
uncertainty over the optimal policy, which can yield near-optimal exploration strategies in …
Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities
Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by
the growth of required resources, expansive datasets and corresponding investments into …
the growth of required resources, expansive datasets and corresponding investments into …
Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic
Anytime multi-agent path finding (MAPF) is a promising approach to scalable path
optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search …
optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search …