Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

Statistical inference with m-estimators on adaptively collected data

K Zhang, L Janson, S Murphy - Advances in neural …, 2021 - proceedings.neurips.cc
Bandit algorithms are increasingly used in real-world sequential decision-making problems.
Associated with this is an increased desire to be able to use the resulting datasets to answer …

Openml benchmarking suites

B Bischl, G Casalicchio, M Feurer, P Gijsbers… - arXiv preprint arXiv …, 2017 - arxiv.org
Machine learning research depends on objectively interpretable, comparable, and
reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites …

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances

Y Bai, AM Shaikh, M Tabord-Meehan - arXiv preprint arXiv:2405.03910, 2024 - arxiv.org
The past two decades have witnessed a surge of new research in the analysis of
randomized experiments. The emergence of this literature may seem surprising given the …

Optimal treatment allocation for efficient policy evaluation in sequential decision making

T Li, C Shi, J Wang, F Zhou - Advances in Neural …, 2024 - proceedings.neurips.cc
A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …

Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

Online statistical inference for matrix contextual bandit

Q Han, WW Sun, Y Zhang - arXiv preprint arXiv:2212.11385, 2022 - arxiv.org
Contextual bandit has been widely used for sequential decision-making based on the
current contextual information and historical feedback data. In modern applications, such …

Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling

S Ghosh, R Kim, P Chhabria, R Dwivedi, P Klasnja… - Machine Learning, 2024 - Springer
There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …

Synthetically controlled bandits

V Farias, C Moallemi, T Peng, A Zheng - arXiv preprint arXiv:2202.07079, 2022 - arxiv.org
This paper presents a new dynamic approach to experiment design in settings where, due to
interference or other concerns, experimental units are coarse.Region-split'experiments on …