Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

A survey of online experiment design with the stochastic multi-armed bandit

G Burtini, J Loeppky, R Lawrence - arXiv preprint arXiv:1510.00757, 2015 - arxiv.org
Adaptive and sequential experiment design is a well-studied area in numerous domains. We
survey and synthesize the work of the online statistical learning paradigm referred to as multi …

Multi-armed bandits in the wild: Pitfalls and strategies in online experiments

DI Mattos, J Bosch, HH Olsson - Information and Software Technology, 2019 - Elsevier
Context Delivering faster value to customers with online experimentation is an emerging
practice in industry. Multi-Armed Bandit (MAB) based experiments have the potential to …

Are sample means in multi-armed bandits positively or negatively biased?

J Shin, A Ramdas, A Rinaldo - Advances in Neural …, 2019 - proceedings.neurips.cc
It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is
typically not an unbiased estimator of its true mean. In this paper, we decouple three …

Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms

M Bayati, N Hamidi, R Johari… - Advances in Neural …, 2020 - proceedings.neurips.cc
We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-
armed bandit problem: in particular, with $ k $ the number of arms and $ T $ the time …

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Tight regret bounds for single-pass streaming multi-armed bandits

C Wang - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively,
and recent work has shown that algorithms with $ o (K) $ memory have to incur $\Omega …

[HTML][HTML] An empirical evaluation of active inference in multi-armed bandits

D Marković, H Stojić, S Schwöbel, SJ Kiebel - Neural Networks, 2021 - Elsevier
A key feature of sequential decision making under uncertainty is a need to balance between
exploiting—choosing the best action according to the current knowledge, and exploring …

Minimax concave penalized multi-armed bandit model with high-dimensional covariates

X Wang, M Wei, T Yao - International Conference on …, 2018 - proceedings.mlr.press
In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit)
algorithm for a decision-maker facing high-dimensional data with latent sparse structure in …