Policy learning with adaptively collected data

R Zhan, Z Ren, S Athey, Z Zhou - Management Science, 2023 - pubsonline.informs.org
In a wide variety of applications, including healthcare, bidding in first price auctions, digital
recommendations, and online education, it can be beneficial to learn a policy that assigns …

Distributionally robust batch contextual bandits

N Si, F Zhang, Z Zhou, J Blanchet - Management Science, 2023 - pubsonline.informs.org
Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …

Offline multi-action policy learning: Generalization and optimization

Z Zhou, S Athey, S Wager - Operations Research, 2023 - pubsonline.informs.org
In many settings, a decision maker wishes to learn a rule, or policy, that maps from
observable characteristics of an individual to an action. Examples include selecting offers …

More efficient policy learning via optimal retargeting

N Kallus - Journal of the American Statistical Association, 2021 - Taylor & Francis
Policy learning can be used to extract individualized treatment regimes from observational
data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a …

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning

A Bibaut, N Kallus, M Dimakopoulou… - Advances in neural …, 2021 - proceedings.neurips.cc
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for
classification and regression or for off-policy policy learning, but its model-agnostic …

Inverse contextual bandits: Learning how behavior evolves over time

A Hüyük, D Jarrett… - … Conference on Machine …, 2022 - proceedings.mlr.press
Understanding a decision-maker's priorities by observing their behavior is critical for
transparency and accountability in decision processes {—} such as in healthcare. Though …

Generalizing off-policy learning under sample selection bias

T Hatt, D Tschernutter… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press
Learning personalized decision policies that generalize to the target population is of great
relevance. Since training data is often not representative of the target population, standard …

Stateful offline contextual policy evaluation and learning

N Kallus, A Zhou - International Conference on Artificial …, 2022 - proceedings.mlr.press
We study off-policy evaluation and learning from sequential data in a structured class of
Markov decision processes that arise from repeated interactions with an exogenous …

Invariant policy learning: A causal perspective

S Saengkyongam, N Thams, J Peters… - IEEE transactions on …, 2023 - ieeexplore.ieee.org
Contextual bandit and reinforcement learning algorithms have been successfully used in
various interactive learning systems such as online advertising, recommender systems, and …

Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …