Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

[PDF][PDF] Adaptivity and confounding in multi-armed bandit experiments

C Qin, D Russo - arXiv preprint arXiv:2202.09036, 2022 - aeaweb.org
We explore a new model of bandit experiments where a potentially nonstationary sequence
of contexts influences arms' performance. Context-unaware algorithms risk confounding …

Non-stationary experimental design under linear trends

D Simchi-Levi, C Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Experimentation has been critical and increasingly popular across various domains, such as
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …

Distributionally robust batch contextual bandits

N Si, F Zhang, Z Zhou, J Blanchet - Management Science, 2023 - pubsonline.informs.org
Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …

Non-stationary representation learning in sequential linear bandits

Y Qin, T Menara, S Oymak, SN Ching… - IEEE Open Journal of …, 2022 - ieeexplore.ieee.org
In this paper, we study representation learning for multi-task decision-making in non-
stationary environments. We consider the framework of sequential linear bandits, where the …

Statistical inference on multi-armed bandits with delayed feedback

L Shi, J Wang, T Wu - International Conference on Machine …, 2023 - proceedings.mlr.press
Multi armed bandit (MAB) algorithms have been increasingly used to complement or
integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and …

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

Multi armed bandit vs. a/b tests in e-commerce-confidence interval and hypothesis test power perspectives

D Xiang, R West, J Wang, X Cui, J Huang - Proceedings of the 28th ACM …, 2022 - dl.acm.org
An emerging dilemma that faces practitioners in large scale online experimentation for e-
commerce is whether to use Multi-Armed Bandit (MAB) algorithms for testing or traditional …

Entropy regularization for population estimation

B Chugg, P Henderson, J Goldin, DE Ho - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Entropy regularization is known to improve exploration in sequential decision-making
problems. We show that this same mechanism can also lead to nearly unbiased and lower …