Off-policy evaluation via adaptive weighting with data from contextual bandits

L Shi, J Wang, T Wu - International Conference on Machine …, 2023 - proceedings.mlr.press

Multi armed bandit (MAB) algorithms have been increasingly used to complement or
integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and …

被引用次数：4 相关文章所有 6 个版本

[PDF] neurips.cc

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

被引用次数：3 相关文章所有 11 个版本

[PDF] arxiv.org

Off-policy evaluation beyond overlap: partial identification through smoothness

S Khan, M Saveski, J Ugander - arXiv preprint arXiv:2305.11812, 2023 - arxiv.org

Off-policy evaluation (OPE) is the problem of estimating the value of a target policy using
historical data collected under a different logging policy. OPE methods typically assume …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Causal reinforcement learning: An instrumental variable approach

J Li, Y Luo, X Zhang - arXiv preprint arXiv:2103.04021, 2021 - arxiv.org

In the standard data analysis framework, data is first collected (once for all), and then data
analysis is carried out. Moreover, the data-generating process is typically assumed to be …

被引用次数：19 相关文章所有 6 个版本

[PDF] harvard.edu

[PDF][PDF] Statistical inference after adaptive sampling in non-markovian environments

KW Zhang, L Janson… - arXiv preprint arXiv …, 2022 - lucasjanson.fas.harvard.edu

There is a great desire to use adaptive sampling methods, such as reinforcement learning
(RL) and bandit algorithms, for the real-time personalization of interventions in digital …

被引用次数：15 相关文章

[PDF] arxiv.org

Battling the coronavirus 'infodemic'among social media users in Kenya and Nigeria

M Offer-Westort, LR Rosenzweig, S Athey - Nature Human Behaviour, 2024 - nature.com

How can we induce social media users to be discerning when sharing information during a
pandemic? An experiment on Facebook Messenger with users from Kenya (n= 7,498) and …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Counterfactual inference for sequential experiments

R Dwivedi, K Tian, S Tomkins, P Klasnja… - arXiv preprint arXiv …, 2022 - arxiv.org

We consider after-study statistical inference for sequentially designed experiments wherein
multiple units are assigned treatments for multiple time points using treatment policies that …

被引用次数：13 相关文章所有 3 个版本

[PDF] neurips.cc

Uncertainty-aware instance reweighting for off-policy learning

X Zhang, J Chen, H Wang, H Xie… - Advances in Neural …, 2023 - proceedings.neurips.cc

Off-policy learning, referring to the procedure of policy optimization with access only to
logged feedback data, has shown importance in various important real-world applications …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Double/debiased machine learning for dynamic treatment effects via g-estimation

G Lewis, V Syrgkanis - arXiv preprint arXiv:2002.07285, 2020 - arxiv.org

We consider the estimation of treatment effects in settings when multiple treatments are
assigned over time and treatments can have a causal effect on future outcomes or the state …

被引用次数：23 相关文章所有 4 个版本

[PDF] neurips.cc

Statistical limits of adaptive linear models: low-dimensional estimation and inference

L Lin, M Ying, S Ghosh, K Khamaru… - Advances in Neural …, 2024 - proceedings.neurips.cc

Estimation and inference in statistics pose significant challenges when data are collected
adaptively. Even in linear models, the Ordinary Least Squares (OLS) estimator may fail to …

被引用次数：1 相关文章所有 8 个版本