相关文章- 学术资源搜索

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：19 相关文章所有 2 个版本

[PDF] neurips.cc

Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc

During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

被引用次数：27 相关文章所有 8 个版本

[PDF] arxiv.org

A survey of online experiment design with the stochastic multi-armed bandit

G Burtini, J Loeppky, R Lawrence - arXiv preprint arXiv:1510.00757, 2015 - arxiv.org

Adaptive and sequential experiment design is a well-studied area in numerous domains. We
survey and synthesize the work of the online statistical learning paradigm referred to as multi …

被引用次数：149 相关文章所有 3 个版本

Multi-armed bandits in the wild: Pitfalls and strategies in online experiments

DI Mattos, J Bosch, HH Olsson - Information and Software Technology, 2019 - Elsevier

Context Delivering faster value to customers with online experimentation is an emerging
practice in industry. Multi-Armed Bandit (MAB) based experiments have the potential to …

被引用次数：25 相关文章所有 4 个版本

[PDF] neurips.cc

Are sample means in multi-armed bandits positively or negatively biased?

J Shin, A Ramdas, A Rinaldo - Advances in Neural …, 2019 - proceedings.neurips.cc

It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is
typically not an unbiased estimator of its true mean. In this paper, we decouple three …

被引用次数：42 相关文章所有 8 个版本

[PDF] neurips.cc

Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms

M Bayati, N Hamidi, R Johari… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-
armed bandit problem: in particular, with $ k $ the number of arms and $ T $ the time …

被引用次数：33 相关文章所有 6 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2835 相关文章所有 9 个版本

[PDF] mlr.press

Tight regret bounds for single-pass streaming multi-armed bandits

C Wang - International Conference on Machine Learning, 2023 - proceedings.mlr.press

Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively,
and recent work has shown that algorithms with $ o (K) $ memory have to incur $\Omega …

被引用次数：7 相关文章所有 6 个版本

[HTML] sciencedirect.com

[HTML][HTML] An empirical evaluation of active inference in multi-armed bandits

D Marković, H Stojić, S Schwöbel, SJ Kiebel - Neural Networks, 2021 - Elsevier

A key feature of sequential decision making under uncertainty is a need to balance between
exploiting—choosing the best action according to the current knowledge, and exploring …

被引用次数：29 相关文章所有 9 个版本

[PDF] mlr.press

Minimax concave penalized multi-armed bandit model with high-dimensional covariates

X Wang, M Wei, T Yao - International Conference on …, 2018 - proceedings.mlr.press

In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit)
algorithm for a decision-maker facing high-dimensional data with latent sparse structure in …

被引用次数：47 相关文章所有 5 个版本

Multi-armed bandit experimental design: Online decision-making and adaptive inference

Online multi-armed bandits with adaptive inference

A survey of online experiment design with the stochastic multi-armed bandit

Multi-armed bandits in the wild: Pitfalls and strategies in online experiments

Are sample means in multi-armed bandits positively or negatively biased?

Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms

[图书][B] Bandit algorithms

Tight regret bounds for single-pass streaming multi-armed bandits

[HTML][HTML] An empirical evaluation of active inference in multi-armed bandits

Minimax concave penalized multi-armed bandit model with high-dimensional covariates

相关搜索

高级搜索

引用