Statistical inference on multi-armed bandits with delayed feedback

L Shi, J Wang, T Wu - International Conference on Machine …, 2023 - proceedings.mlr.press
Multi armed bandit (MAB) algorithms have been increasingly used to complement or
integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and …

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

Off-policy evaluation beyond overlap: partial identification through smoothness

S Khan, M Saveski, J Ugander - arXiv preprint arXiv:2305.11812, 2023 - arxiv.org
Off-policy evaluation (OPE) is the problem of estimating the value of a target policy using
historical data collected under a different logging policy. OPE methods typically assume …

Causal reinforcement learning: An instrumental variable approach

J Li, Y Luo, X Zhang - arXiv preprint arXiv:2103.04021, 2021 - arxiv.org
In the standard data analysis framework, data is first collected (once for all), and then data
analysis is carried out. Moreover, the data-generating process is typically assumed to be …

[PDF][PDF] Statistical inference after adaptive sampling in non-markovian environments

KW Zhang, L Janson… - arXiv preprint arXiv …, 2022 - lucasjanson.fas.harvard.edu
There is a great desire to use adaptive sampling methods, such as reinforcement learning
(RL) and bandit algorithms, for the real-time personalization of interventions in digital …

Battling the coronavirus 'infodemic'among social media users in Kenya and Nigeria

M Offer-Westort, LR Rosenzweig, S Athey - Nature Human Behaviour, 2024 - nature.com
How can we induce social media users to be discerning when sharing information during a
pandemic? An experiment on Facebook Messenger with users from Kenya (n= 7,498) and …

Counterfactual inference for sequential experiments

R Dwivedi, K Tian, S Tomkins, P Klasnja… - arXiv preprint arXiv …, 2022 - arxiv.org
We consider after-study statistical inference for sequentially designed experiments wherein
multiple units are assigned treatments for multiple time points using treatment policies that …

Uncertainty-aware instance reweighting for off-policy learning

X Zhang, J Chen, H Wang, H Xie… - Advances in Neural …, 2023 - proceedings.neurips.cc
Off-policy learning, referring to the procedure of policy optimization with access only to
logged feedback data, has shown importance in various important real-world applications …

Double/debiased machine learning for dynamic treatment effects via g-estimation

G Lewis, V Syrgkanis - arXiv preprint arXiv:2002.07285, 2020 - arxiv.org
We consider the estimation of treatment effects in settings when multiple treatments are
assigned over time and treatments can have a causal effect on future outcomes or the state …

Statistical limits of adaptive linear models: low-dimensional estimation and inference

L Lin, M Ying, S Ghosh, K Khamaru… - Advances in Neural …, 2024 - proceedings.neurips.cc
Estimation and inference in statistics pose significant challenges when data are collected
adaptively. Even in linear models, the Ordinary Least Squares (OLS) estimator may fail to …