A survey of learning in multiagent environments: Dealing with non-stationarity

P Hernandez-Leal, M Kaisers, T Baarslag… - arXiv preprint arXiv …, 2017 - arxiv.org
The key challenge in multiagent learning is learning a best response to the behaviour of
other agents, which may be non-stationary: if the other agents adapt their strategy as well …

Multi-objective multi-agent decision making: a utility-based analysis and survey

R Rădulescu, P Mannion, DM Roijers… - Autonomous Agents and …, 2020 - Springer
The majority of multi-agent system implementations aim to optimise agents' policies with
respect to a single objective, despite the fact that many real-world problem domains are …

Convergent policy optimization for safe reinforcement learning

M Yu, Z Yang, M Kolar, Z Wang - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the safe reinforcement learning problem with nonlinear function approximation,
where policy optimization is formulated as a constrained optimization problem with both the …

Multi-agent path finding with delay probabilities

H Ma, TKS Kumar, S Koenig - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org
Abstract Several recently developed Multi-Agent Path Finding (MAPF) solvers scale to large
MAPF instances by searching for MAPF plans on 2 levels: The high-level search resolves …

[图书][B] Multi-objective decision making

DM Roijers, S Whiteson, R Brachman, P Stone - 2017 - Springer
Many real-world decision problems have multiple objectives. For example, when choosing a
medical treatment plan, we want to maximize the efficacy of the treatment, but also minimize …

Constrained multiagent Markov decision processes: A taxonomy of problems and algorithms

F De Nijs, E Walraven, M De Weerdt, M Spaan - Journal of Artificial …, 2021 - jair.org
In domains such as electric vehicle charging, smart distribution grids and autonomous
warehouses, multiple agents share the same resources. When planning the use of these …

Simultaneous task allocation and planning under uncertainty

F Faruq, D Parker, B Laccrda… - 2018 IEEE/RSJ …, 2018 - ieeexplore.ieee.org
We propose novel techniques for task allocation and planning in multi-robot systems
operating in uncertain environments. Task allocation is performed simultaneously with …

Finite-time frequentist regret bounds of multi-agent thompson sampling on sparse hypergraphs

T Jin, HL Hsu, W Chang, P Xu - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored
into overlapping groups. Each group represents a hyperedge, forming a hypergraph over …

A prioritized planning algorithm of trajectory coordination based on time windows for multiple AGVs with delay disturbance

R Tai, J Wang, W Chen - Assembly Automation, 2019 - emerald.com
Purpose In the running of multiple automated guided vehicles (AGVs) in warehouses, delay
problems in motions happen unavoidably as there might exist some disabled components of …

Multi-agent thompson sampling for bandit applications with sparse neighbourhood structures

T Verstraeten, E Bargiacchi, PJK Libin, J Helsen… - Scientific reports, 2020 - nature.com
Multi-agent coordination is prevalent in many real-world applications. However, such
coordination is challenging due to its combinatorial nature. An important observation in this …