相关文章- 学术资源搜索

Policy learning with adaptively collected data

R Zhan, Z Ren, S Athey, Z Zhou - Management Science, 2023 - pubsonline.informs.org

In a wide variety of applications, including healthcare, bidding in first price auctions, digital
recommendations, and online education, it can be beneficial to learn a policy that assigns …

被引用次数：26 相关文章所有 7 个版本

[PDF] arxiv.org

Distributionally robust batch contextual bandits

N Si, F Zhang, Z Zhou, J Blanchet - Management Science, 2023 - pubsonline.informs.org

Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …

被引用次数：32 相关文章所有 8 个版本

[PDF] arxiv.org

Offline multi-action policy learning: Generalization and optimization

Z Zhou, S Athey, S Wager - Operations Research, 2023 - pubsonline.informs.org

In many settings, a decision maker wishes to learn a rule, or policy, that maps from
observable characteristics of an individual to an action. Examples include selecting offers …

被引用次数：174 相关文章所有 11 个版本

[PDF] arxiv.org

More efficient policy learning via optimal retargeting

N Kallus - Journal of the American Statistical Association, 2021 - Taylor & Francis

Policy learning can be used to extract individualized treatment regimes from observational
data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a …

被引用次数：45 相关文章所有 9 个版本

[PDF] neurips.cc

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning

A Bibaut, N Kallus, M Dimakopoulou… - Advances in neural …, 2021 - proceedings.neurips.cc

Empirical risk minimization (ERM) is the workhorse of machine learning, whether for
classification and regression or for off-policy policy learning, but its model-agnostic …

被引用次数：16 相关文章所有 17 个版本

[PDF] mlr.press

Inverse contextual bandits: Learning how behavior evolves over time

A Hüyük, D Jarrett… - … Conference on Machine …, 2022 - proceedings.mlr.press

Understanding a decision-maker's priorities by observing their behavior is critical for
transparency and accountability in decision processes {—} such as in healthcare. Though …

被引用次数：15 相关文章所有 4 个版本

[PDF] mlr.press

Generalizing off-policy learning under sample selection bias

T Hatt, D Tschernutter… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press

Learning personalized decision policies that generalize to the target population is of great
relevance. Since training data is often not representative of the target population, standard …

被引用次数：21 相关文章所有 6 个版本

[PDF] mlr.press

Stateful offline contextual policy evaluation and learning

N Kallus, A Zhou - International Conference on Artificial …, 2022 - proceedings.mlr.press

We study off-policy evaluation and learning from sequential data in a structured class of
Markov decision processes that arise from repeated interactions with an exogenous …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

Invariant policy learning: A causal perspective

S Saengkyongam, N Thams, J Peters… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

Contextual bandit and reinforcement learning algorithms have been successfully used in
various interactive learning systems such as online advertising, recommender systems, and …

被引用次数：20 相关文章所有 10 个版本

[PDF] acm.org

Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：29 相关文章所有 5 个版本

Policy learning with adaptively collected data

Distributionally robust batch contextual bandits

Offline multi-action policy learning: Generalization and optimization

More efficient policy learning via optimal retargeting

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning

Inverse contextual bandits: Learning how behavior evolves over time

Generalizing off-policy learning under sample selection bias

Stateful offline contextual policy evaluation and learning

Invariant policy learning: A causal perspective

Anytime-valid off-policy inference for contextual bandits

相关搜索

高级搜索

引用