Near-optimal collaborative learning in bandits

C Réda, S Vakili, E Kaufmann - Advances in Neural …, 2022 - proceedings.neurips.cc
This paper introduces a general multi-agent bandit model in which each agent is facing a
finite set of arms and may communicate with other agents through a central controller in …

Active coverage for pac reinforcement learning

A Al-Marjani, A Tirinzoni… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Collecting and leveraging data with good coverage properties plays a crucial role in different
aspects of reinforcement learning (RL), including reward-free exploration and offline …

Information-directed selection for top-two algorithms

W You, C Qin, Z Wang, S Yang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the best-k-arm identification problem for multi-armed bandits, where the
objective is to select the exact set of k arms with the highest mean rewards by sequentially …

On elimination strategies for bandit fixed-confidence identification

A Tirinzoni, R Degenne - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Elimination algorithms for bandit identification, which prune the plausible correct answers
sequentially until only one remains, are computationally convenient since they reduce the …

[PDF][PDF] Optimal clustering with bandit feedback

J Yang, Z Zhong, VYF Tan - Journal of Machine Learning Research, 2024 - jmlr.org
This paper considers the problem of online clustering with bandit feedback. A set of arms (or
items) can be partitioned into various groups that are unknown. Within each group, the …

Towards instance-optimality in online pac reinforcement learning

A Al-Marjani, A Tirinzoni, E Kaufmann - arXiv preprint arXiv:2311.05638, 2023 - arxiv.org
Several recent works have proposed instance-dependent upper bounds on the number of
episodes needed to identify, with probability $1-\delta $, an $\varepsilon $-optimal policy in …

Optimal Regret Bounds for Collaborative Learning in Bandits

A Shidani, S Vakili - International Conference on Algorithmic …, 2024 - proceedings.mlr.press
We consider regret minimization in a general collaborative multi-agent multi-armed bandit
model, in which each agent faces a finite set of arms and may communicate with other …

On the complexity of representation learning in contextual linear bandits

A Tirinzoni, M Pirotta, A Lazaric - … Conference on Artificial …, 2023 - proceedings.mlr.press
In contextual linear bandits, the reward function is assumed to be a linear combination of an
unknown reward vector and a given embedding of context-arm pairs. In practice, the …

Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits

M Jourdan, R Degenne - International Conference on …, 2022 - proceedings.mlr.press
In pure-exploration problems, information is gathered sequentially to answer a question on
the stochastic environment. While best-arm identification for linear bandits has been …

Dual-Directed Algorithm Design for Efficient Pure Exploration

C Qin, W You - arXiv preprint arXiv:2310.19319, 2023 - arxiv.org
We consider pure-exploration problems in the context of stochastic sequential adaptive
experiments with a finite set of alternative options. The goal of the decision-maker is to …