Post-contextual-bandit inference

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：29 相关文章所有 5 个版本

[PDF] neurips.cc

Statistical inference with m-estimators on adaptively collected data

K Zhang, L Janson, S Murphy - Advances in neural …, 2021 - proceedings.neurips.cc

Bandit algorithms are increasingly used in real-world sequential decision-making problems.
Associated with this is an increased desire to be able to use the resulting datasets to answer …

被引用次数：51 相关文章所有 15 个版本

[PDF] arxiv.org

Openml benchmarking suites

B Bischl, G Casalicchio, M Feurer, P Gijsbers… - arXiv preprint arXiv …, 2017 - arxiv.org

Machine learning research depends on objectively interpretable, comparable, and
reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites …

被引用次数：118 相关文章所有 9 个版本

[PDF] mlr.press

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances

Y Bai, AM Shaikh, M Tabord-Meehan - arXiv preprint arXiv:2405.03910, 2024 - arxiv.org

The past two decades have witnessed a surge of new research in the analysis of
randomized experiments. The emergence of this literature may seem surprising given the …

被引用次数：1 相关文章所有 7 个版本

[PDF] neurips.cc

Optimal treatment allocation for efficient policy evaluation in sequential decision making

T Li, C Shi, J Wang, F Zhou - Advances in Neural …, 2024 - proceedings.neurips.cc

A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …

被引用次数：2 相关文章所有 6 个版本

[PDF] neurips.cc

Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc

During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

被引用次数：27 相关文章所有 8 个版本

[PDF] arxiv.org

Online statistical inference for matrix contextual bandit

Q Han, WW Sun, Y Zhang - arXiv preprint arXiv:2212.11385, 2022 - arxiv.org

Contextual bandit has been widely used for sequential decision-making based on the
current contextual information and historical feedback data. In modern applications, such …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling

S Ghosh, R Kim, P Chhabria, R Dwivedi, P Klasnja… - Machine Learning, 2024 - Springer

There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Synthetically controlled bandits

V Farias, C Moallemi, T Peng, A Zheng - arXiv preprint arXiv:2202.07079, 2022 - arxiv.org

This paper presents a new dynamic approach to experiment design in settings where, due to
interference or other concerns, experimental units are coarse.Region-split'experiments on …

被引用次数：14 相关文章所有 5 个版本