Post-contextual-bandit inference

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：30 相关文章所有 5 个版本

[PDF] neurips.cc

Statistical inference with m-estimators on adaptively collected data

K Zhang, L Janson, S Murphy - Advances in neural …, 2021 - proceedings.neurips.cc

Bandit algorithms are increasingly used in real-world sequential decision-making problems.
Associated with this is an increased desire to be able to use the resulting datasets to answer …

被引用次数：59 相关文章所有 15 个版本

[PDF] arxiv.org

Openml benchmarking suites

B Bischl, G Casalicchio, M Feurer, P Gijsbers… - arXiv preprint arXiv …, 2017 - arxiv.org

Machine learning research depends on objectively interpretable, comparable, and
reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites …

被引用次数：148 相关文章所有 9 个版本

[PDF] mlr.press

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances

Y Bai, AM Shaikh, M Tabord-Meehan - arXiv preprint arXiv:2405.03910, 2024 - arxiv.org

The past two decades have witnessed a surge of new research in the analysis of
randomized experiments. The emergence of this literature may seem surprising given the …

被引用次数：5 相关文章所有 7 个版本

[PDF] neurips.cc

Optimal treatment allocation for efficient policy evaluation in sequential decision making

T Li, C Shi, J Wang, F Zhou - Advances in Neural …, 2024 - proceedings.neurips.cc

A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …

被引用次数：6 相关文章所有 6 个版本

[PDF] neurips.cc

Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc

During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

被引用次数：34 相关文章所有 8 个版本

[PDF] arxiv.org

Online statistical inference for matrix contextual bandit

Q Han, WW Sun, Y Zhang - arXiv preprint arXiv:2212.11385, 2022 - arxiv.org

Contextual bandit has been widely used for sequential decision-making based on the
current contextual information and historical feedback data. In modern applications, such …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling

S Ghosh, R Kim, P Chhabria, R Dwivedi, P Klasnja… - Machine Learning, 2024 - Springer

There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …

被引用次数：8 相关文章所有 4 个版本

[PDF] ssrn.com

Correlated cluster-based randomized experiments: Robust variance minimization

O Candogan, C Chen, R Niazadeh - Management Science, 2024 - pubsonline.informs.org

Experimentation is prevalent in online marketplaces and social networks to assess the
effectiveness of new market intervention. To mitigate the interference among users in an …

被引用次数：19 相关文章所有 6 个版本