Anytime-valid off-policy inference for contextual bandits
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …
healthcare and the tech industry. They involve online learning algorithms that adaptively …
Multi-armed bandit experimental design: Online decision-making and adaptive inference
D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
Post-contextual-bandit inference
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-
commerce, healthcare, and policymaking because they can both improve outcomes for …
commerce, healthcare, and policymaking because they can both improve outcomes for …
Clip-ogd: An experimental design for adaptive neyman allocation in sequential experiments
From clinical development of cancer therapies to investigations into partisan bias, adaptive
sequential designs have become increasingly popular method for causal inference, as they …
sequential designs have become increasingly popular method for causal inference, as they …
A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances
The past two decades have witnessed a surge of new research in the analysis of
randomized experiments. The emergence of this literature may seem surprising given the …
randomized experiments. The emergence of this literature may seem surprising given the …
Microrandomized trials: developing just-in-time adaptive interventions for better public health
Just-in-time adaptive interventions (JITAIs) represent an intervention design that adapts the
provision and type of support over time to an individual's changing status and contexts …
provision and type of support over time to an individual's changing status and contexts …
On instance-dependent bounds for offline reinforcement learning with linear function approximation
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …
been studied extensively recently. Much of the prior work has yielded instance-independent …
Adaptive instrument design for indirect experiments
Y Chandak, S Shankar, V Syrgkanis… - The Twelfth International …, 2023 - openreview.net
Indirect experiments provide a valuable framework for estimating treatment effects in
situations where conducting randomized control trials (RCTs) is impractical or unethical …
situations where conducting randomized control trials (RCTs) is impractical or unethical …
Adaptive principal component regression with applications to panel data
Principal component regression (PCR) is a popular technique for fixed-design error-in-
variables regression, a generalization of the linear regression setting in which the observed …
variables regression, a generalization of the linear regression setting in which the observed …
A lower bound for linear and kernel regression with adaptive covariates
T Lattimore - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
We prove that the continuous time version of the concentration bounds by Abbasi-Yadkori et
al.(2011) for adaptive linear regression cannot be improved in general, showing that there …
al.(2011) for adaptive linear regression cannot be improved in general, showing that there …