Optimal Exploration is no harder than Thompson Sampling

Z Li, K Jamieson, L Jain - International Conference on …, 2024 - proceedings.mlr.press
Given a set of arms $\mathcal {Z}\subset\mathbb {R}^ d $ and an unknown parameter vector
$\theta_\ast\in\mathbb {R}^ d $, the pure exploration linear bandits problem aims to return …

Information-directed selection for top-two algorithms

W You, C Qin, Z Wang, S Yang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the best-k-arm identification problem for multi-armed bandits, where the
objective is to select the exact set of k arms with the highest mean rewards by sequentially …

Pure Exploration under Mediators' Feedback

R Poiani, AM Metelli, M Restelli - arXiv preprint arXiv:2308.15552, 2023 - arxiv.org
Stochastic multi-armed bandits are a sequential-decision-making framework, where, at each
interaction step, the learner selects an arm and observes a stochastic reward. Within the …

Dual-Directed Algorithm Design for Efficient Pure Exploration

C Qin, W You - arXiv preprint arXiv:2310.19319, 2023 - arxiv.org
We consider pure-exploration problems in the context of stochastic sequential adaptive
experiments with a finite set of alternative options. The goal of the decision-maker is to …

Active clustering with bandit feedback

V Thuot, A Carpentier, C Giraud, N Verzelen - arXiv preprint arXiv …, 2024 - arxiv.org
We investigate the Active Clustering Problem (ACP). A learner interacts with an $ N $-armed
stochastic bandit with $ d $-dimensional subGaussian feedback. There exists a hidden …

An Anytime Algorithm for Good Arm Identification

M Jourdan, C Réda - arXiv preprint arXiv:2310.10359, 2023 - arxiv.org
In good arm identification (GAI), the goal is to identify one arm whose average performance
exceeds a given threshold, referred to as good arm, if it exists. Few works have studied GAI …