[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Hyperband: A novel bandit-based approach to hyperparameter optimization

L Li, K Jamieson, G DeSalvo, A Rostamizadeh… - Journal of Machine …, 2018 - jmlr.org
Performance of machine learning algorithms depends critically on identifying a good set of
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …

Time-uniform, nonparametric, nonasymptotic confidence sequences

SR Howard, A Ramdas, J McAuliffe, J Sekhon - 2021 - projecteuclid.org
Time-uniform, nonparametric, nonasymptotic confidence sequences Page 1 The Annals of
Statistics 2021, Vol. 49, No. 2, 1055–1080 https://doi.org/10.1214/20-AOS1991 © Institute of …

Simple bayesian algorithms for best arm identification

D Russo - Conference on Learning Theory, 2016 - proceedings.mlr.press
This paper considers the optimal adaptive allocation of measurement effort for identifying the
best among a finite set of options or designs. An experimenter sequentially chooses designs …

Top two algorithms revisited

M Jourdan, R Degenne, D Baudry… - Advances in …, 2022 - proceedings.neurips.cc
Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …

Combinatorial pure exploration of multi-armed bandits

S Chen, T Lin, I King, MR Lyu… - Advances in neural …, 2014 - proceedings.neurips.cc
We study the {\em combinatorial pure exploration (CPE)} problem in the stochastic multi-
armed bandit setting, where a learner explores a set of arms with the objective of identifying …

Improving the expected improvement algorithm

C Qin, D Klabjan, D Russo - Advances in Neural …, 2017 - proceedings.neurips.cc
The expected improvement (EI) algorithm is a popular strategy for information collection in
optimization under uncertainty. The algorithm is widely known to be too greedy, but …

Inference for batched bandits

K Zhang, L Janson, S Murphy - Advances in neural …, 2020 - proceedings.neurips.cc
As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …

Social learning in multi agent multi armed bandits

A Sankararaman, A Ganesh, S Shakkottai - Proceedings of the ACM on …, 2019 - dl.acm.org
Motivated by emerging need of learning algorithms for large scale networked and
decentralized systems, we introduce a distributed version of the classical stochastic Multi …

Tight (lower) bounds for the fixed budget best arm identification bandit problem

A Carpentier, A Locatelli - Conference on Learning Theory, 2016 - proceedings.mlr.press
We consider the problem of\textitbest arm identification with a\textitfixed budget T, in the K-
armed stochastic bandit setting, with arms distribution defined on [0, 1]. We prove that any …