Online multi-armed bandits with adaptive inference

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：29 相关文章所有 5 个版本

[PDF] mlr.press

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：21 相关文章所有 2 个版本

[PDF] aeaweb.org

[PDF][PDF] Adaptivity and confounding in multi-armed bandit experiments

C Qin, D Russo - arXiv preprint arXiv:2202.09036, 2022 - aeaweb.org

We explore a new model of bandit experiments where a potentially nonstationary sequence
of contexts influences arms' performance. Context-unaware algorithms risk confounding …

被引用次数：30 相关文章所有 3 个版本

[PDF] neurips.cc

Non-stationary experimental design under linear trends

D Simchi-Levi, C Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Experimentation has been critical and increasingly popular across various domains, such as
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Distributionally robust batch contextual bandits

N Si, F Zhang, Z Zhou, J Blanchet - Management Science, 2023 - pubsonline.informs.org

Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …

被引用次数：33 相关文章所有 8 个版本

[PDF] ieee.org

Non-stationary representation learning in sequential linear bandits

Y Qin, T Menara, S Oymak, SN Ching… - IEEE Open Journal of …, 2022 - ieeexplore.ieee.org

In this paper, we study representation learning for multi-task decision-making in non-
stationary environments. We consider the framework of sequential linear bandits, where the …

被引用次数：16 相关文章所有 4 个版本

[PDF] mlr.press

Statistical inference on multi-armed bandits with delayed feedback

L Shi, J Wang, T Wu - International Conference on Machine …, 2023 - proceedings.mlr.press

Multi armed bandit (MAB) algorithms have been increasingly used to complement or
integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and …

被引用次数：3 相关文章所有 6 个版本

[PDF] neurips.cc

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

被引用次数：2 相关文章所有 11 个版本

[PDF] archive.org

Multi armed bandit vs. a/b tests in e-commerce-confidence interval and hypothesis test power perspectives

D Xiang, R West, J Wang, X Cui, J Huang - Proceedings of the 28th ACM …, 2022 - dl.acm.org

An emerging dilemma that faces practitioners in large scale online experimentation for e-
commerce is whether to use Multi-Armed Bandit (MAB) algorithms for testing or traditional …

被引用次数：8 相关文章所有 2 个版本

[PDF] aaai.org

Entropy regularization for population estimation

B Chugg, P Henderson, J Goldin, DE Ho - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Entropy regularization is known to improve exploration in sequential decision-making
problems. We show that this same mechanism can also lead to nearly unbiased and lower …

被引用次数：4 相关文章所有 5 个版本