Top arm identification in multi-armed bandits with batch arm pulls

K Kandasamy, A Krishnamurthy… - International …, 2018 - proceedings.mlr.press

We design and analyse variations of the classical Thompson sampling (TS) procedure for
Bayesian optimisation (BO) in settings where function evaluations are expensive but can be …

被引用次数：269 相关文章所有 5 个版本

[PDF] neurips.cc

Batched multi-armed bandits problem

Z Gao, Y Han, Z Ren, Z Zhou - Advances in Neural …, 2019 - proceedings.neurips.cc

In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …

被引用次数：152 相关文章所有 15 个版本

[PDF] neurips.cc

Inference for batched bandits

K Zhang, L Janson, S Murphy - Advances in neural …, 2020 - proceedings.neurips.cc

As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …

被引用次数：89 相关文章所有 11 个版本

[PDF] arxiv.org

Linear bandits with limited adaptivity and learning distributional optimal design

Y Ruan, J Yang, Y Zhou - Proceedings of the 53rd Annual ACM SIGACT …, 2021 - dl.acm.org

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity
constraints to linear contextual bandits, a central problem in online learning and decision …

被引用次数：56 相关文章所有 6 个版本

[PDF] mlr.press

Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons

A Agarwal, S Agarwal, S Assadi… - … on Learning Theory, 2017 - proceedings.mlr.press

In many learning settings, active/adaptive querying is possible, but the number of rounds of
adaptivity is limited. We study the relationship between query complexity and adaptivity in …

被引用次数：111 相关文章所有 6 个版本

[PDF] arxiv.org

Stochastic bandit models for delayed conversions

C Vernade, O Cappé, V Perchet - arXiv preprint arXiv:1706.09186, 2017 - arxiv.org

Online advertising and product recommendation are important domains of applications for
multi-armed bandit methods. In these fields, the reward that is immediately available is most …

被引用次数：100 相关文章所有 8 个版本

[PDF] aaai.org

Regret bounds for batched bandits

H Esfandiari, A Karbasi, A Mehrabian… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

We present simple algorithms for batched stochastic multi-armed bandit and batched
stochastic linear bandit problems. We prove bounds for their expected regrets that improve …

被引用次数：66 相关文章所有 6 个版本

[PDF] mlr.press

Revisiting simple regret: Fast rates for returning a good arm

Y Zhao, C Stephens, C Szepesvári… - … on Machine Learning, 2023 - proceedings.mlr.press

Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …

被引用次数：9 相关文章所有 6 个版本

[PDF] mlr.press

Best arm identification in multi-armed bandits with delayed feedback

A Grover, T Markov, P Attia, N Jin… - International …, 2018 - proceedings.mlr.press

In this paper, we propose a generalization of the best arm identification problem in
stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated …

被引用次数：64 相关文章所有 8 个版本

[PDF] neurips.cc

Adaptive algorithms for relaxed pareto set identification

C Kone, E Kaufmann, L Richert - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper we revisit the fixed-confidence identification of the Pareto optimal set in a multi-
objective multi-armed bandit model. As the sample complexity to identify the exact Pareto set …

被引用次数：2 相关文章所有 14 个版本