Anytime-valid off-policy inference for contextual bandits
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …
healthcare and the tech industry. They involve online learning algorithms that adaptively …
Multi-armed bandit experimental design: Online decision-making and adaptive inference
D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
[PDF][PDF] Adaptivity and confounding in multi-armed bandit experiments
We explore a new model of bandit experiments where a potentially nonstationary sequence
of contexts influences arms' performance. Context-unaware algorithms risk confounding …
of contexts influences arms' performance. Context-unaware algorithms risk confounding …
Non-stationary experimental design under linear trends
D Simchi-Levi, C Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Experimentation has been critical and increasingly popular across various domains, such as
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …
Distributionally robust batch contextual bandits
Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …
widespread applications. Examples include selecting offers, prices, or advertisements for …
Non-stationary representation learning in sequential linear bandits
In this paper, we study representation learning for multi-task decision-making in non-
stationary environments. We consider the framework of sequential linear bandits, where the …
stationary environments. We consider the framework of sequential linear bandits, where the …
Statistical inference on multi-armed bandits with delayed feedback
Multi armed bandit (MAB) algorithms have been increasingly used to complement or
integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and …
integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and …
Adaptive linear estimating equations
Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …
efficiency of data gathering processes. Despite its advantages, such data collection …
Multi armed bandit vs. a/b tests in e-commerce-confidence interval and hypothesis test power perspectives
An emerging dilemma that faces practitioners in large scale online experimentation for e-
commerce is whether to use Multi-Armed Bandit (MAB) algorithms for testing or traditional …
commerce is whether to use Multi-Armed Bandit (MAB) algorithms for testing or traditional …
Entropy regularization for population estimation
Entropy regularization is known to improve exploration in sequential decision-making
problems. We show that this same mechanism can also lead to nearly unbiased and lower …
problems. We show that this same mechanism can also lead to nearly unbiased and lower …