Off-policy evaluation via adaptive weighting with data from contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2022 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：33 相关文章所有 5 个版本

[PDF] neurips.cc

Statistical inference with m-estimators on adaptively collected data

K Zhang, L Janson, S Murphy - Advances in neural …, 2021 - proceedings.neurips.cc

Bandit algorithms are increasingly used in real-world sequential decision-making problems.
Associated with this is an increased desire to be able to use the resulting datasets to answer …

被引用次数：51 相关文章所有 15 个版本

[PDF] mlr.press

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：19 相关文章所有 2 个版本

[PDF] neurips.cc

Post-contextual-bandit inference

A Bibaut, M Dimakopoulou, N Kallus… - Advances in neural …, 2021 - proceedings.neurips.cc

Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-
commerce, healthcare, and policymaking because they can both improve outcomes for …

被引用次数：40 相关文章所有 14 个版本

[PDF] neurips.cc

Uncertainty-aware instance reweighting for off-policy learning

X Zhang, J Chen, H Wang, H Xie… - Advances in Neural …, 2023 - proceedings.neurips.cc

Off-policy learning, referring to the procedure of policy optimization with access only to
logged feedback data, has shown importance in various important real-world applications …

被引用次数：2 相关文章所有 2 个版本

[PDF] neurips.cc

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc

In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …

被引用次数：3 相关文章所有 5 个版本

[PDF] aaai.org

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org

Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

被引用次数：14 相关文章所有 7 个版本

[PDF] arxiv.org

Policy learning with adaptively collected data

R Zhan, Z Ren, S Athey, Z Zhou - Management Science, 2023 - pubsonline.informs.org

In a wide variety of applications, including healthcare, bidding in first price auctions, digital
recommendations, and online education, it can be beneficial to learn a policy that assigns …

被引用次数：27 相关文章所有 7 个版本

[PDF] neurips.cc

Non-stationary experimental design under linear trends

D Simchi-Levi, C Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Experimentation has been critical and increasingly popular across various domains, such as
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …

被引用次数：2 相关文章所有 2 个版本

[PDF] neurips.cc

Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc

During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

被引用次数：27 相关文章所有 7 个版本