On elimination strategies for bandit fixed-confidence identification

Z Li, K Jamieson, L Jain - International Conference on …, 2024 - proceedings.mlr.press

Given a set of arms $\mathcal {Z}\subset\mathbb {R}^ d $ and an unknown parameter vector
$\theta_\ast\in\mathbb {R}^ d $, the pure exploration linear bandits problem aims to return …

被引用次数：1 相关文章所有 4 个版本

[PDF] mlr.press

Information-directed selection for top-two algorithms

W You, C Qin, Z Wang, S Yang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the best-k-arm identification problem for multi-armed bandits, where the
objective is to select the exact set of k arms with the highest mean rewards by sequentially …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Pure Exploration under Mediators' Feedback

R Poiani, AM Metelli, M Restelli - arXiv preprint arXiv:2308.15552, 2023 - arxiv.org

Stochastic multi-armed bandits are a sequential-decision-making framework, where, at each
interaction step, the learner selects an arm and observes a stochastic reward. Within the …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

被引用次数：1 相关文章所有 2 个版本