Anytime-valid off-policy inference for contextual bandits
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …
healthcare and the tech industry. They involve online learning algorithms that adaptively …
Statistical inference with m-estimators on adaptively collected data
Bandit algorithms are increasingly used in real-world sequential decision-making problems.
Associated with this is an increased desire to be able to use the resulting datasets to answer …
Associated with this is an increased desire to be able to use the resulting datasets to answer …
Openml benchmarking suites
Machine learning research depends on objectively interpretable, comparable, and
reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites …
reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites …
Multi-armed bandit experimental design: Online decision-making and adaptive inference
D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances
The past two decades have witnessed a surge of new research in the analysis of
randomized experiments. The emergence of this literature may seem surprising given the …
randomized experiments. The emergence of this literature may seem surprising given the …
Optimal treatment allocation for efficient policy evaluation in sequential decision making
A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …
newly developed products against standard baselines. This paper studies optimal designs …
Online multi-armed bandits with adaptive inference
M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …
inference on the true mean reward of each arm based on data collected so far at each step …
Online statistical inference for matrix contextual bandit
Contextual bandit has been widely used for sequential decision-making based on the
current contextual information and historical feedback data. In modern applications, such …
current contextual information and historical feedback data. In modern applications, such …
Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling
There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …
treatments in digital health to support users in adopting healthier behaviors. Such sequential …
Correlated cluster-based randomized experiments: Robust variance minimization
Experimentation is prevalent in online marketplaces and social networks to assess the
effectiveness of new market intervention. To mitigate the interference among users in an …
effectiveness of new market intervention. To mitigate the interference among users in an …