Multi-armed bandit experimental design: Online decision-making and adaptive inference
D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
Online multi-armed bandits with adaptive inference
M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …
inference on the true mean reward of each arm based on data collected so far at each step …
A survey of online experiment design with the stochastic multi-armed bandit
Adaptive and sequential experiment design is a well-studied area in numerous domains. We
survey and synthesize the work of the online statistical learning paradigm referred to as multi …
survey and synthesize the work of the online statistical learning paradigm referred to as multi …
Multi-armed bandits in the wild: Pitfalls and strategies in online experiments
Context Delivering faster value to customers with online experimentation is an emerging
practice in industry. Multi-Armed Bandit (MAB) based experiments have the potential to …
practice in industry. Multi-Armed Bandit (MAB) based experiments have the potential to …
Are sample means in multi-armed bandits positively or negatively biased?
It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is
typically not an unbiased estimator of its true mean. In this paper, we decouple three …
typically not an unbiased estimator of its true mean. In this paper, we decouple three …
Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms
We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-
armed bandit problem: in particular, with $ k $ the number of arms and $ T $ the time …
armed bandit problem: in particular, with $ k $ the number of arms and $ T $ the time …
[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Tight regret bounds for single-pass streaming multi-armed bandits
C Wang - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively,
and recent work has shown that algorithms with $ o (K) $ memory have to incur $\Omega …
and recent work has shown that algorithms with $ o (K) $ memory have to incur $\Omega …
[HTML][HTML] An empirical evaluation of active inference in multi-armed bandits
A key feature of sequential decision making under uncertainty is a need to balance between
exploiting—choosing the best action according to the current knowledge, and exploring …
exploiting—choosing the best action according to the current knowledge, and exploring …
Minimax concave penalized multi-armed bandit model with high-dimensional covariates
In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit)
algorithm for a decision-maker facing high-dimensional data with latent sparse structure in …
algorithm for a decision-maker facing high-dimensional data with latent sparse structure in …