Policy learning with adaptively collected data
In a wide variety of applications, including healthcare, bidding in first price auctions, digital
recommendations, and online education, it can be beneficial to learn a policy that assigns …
recommendations, and online education, it can be beneficial to learn a policy that assigns …
Distributionally robust batch contextual bandits
Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …
widespread applications. Examples include selecting offers, prices, or advertisements for …
Offline multi-action policy learning: Generalization and optimization
In many settings, a decision maker wishes to learn a rule, or policy, that maps from
observable characteristics of an individual to an action. Examples include selecting offers …
observable characteristics of an individual to an action. Examples include selecting offers …
More efficient policy learning via optimal retargeting
N Kallus - Journal of the American Statistical Association, 2021 - Taylor & Francis
Policy learning can be used to extract individualized treatment regimes from observational
data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a …
data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a …
Risk minimization from adaptively collected data: Guarantees for supervised and policy learning
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for
classification and regression or for off-policy policy learning, but its model-agnostic …
classification and regression or for off-policy policy learning, but its model-agnostic …
Inverse contextual bandits: Learning how behavior evolves over time
Understanding a decision-maker's priorities by observing their behavior is critical for
transparency and accountability in decision processes {—} such as in healthcare. Though …
transparency and accountability in decision processes {—} such as in healthcare. Though …
Generalizing off-policy learning under sample selection bias
T Hatt, D Tschernutter… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press
Learning personalized decision policies that generalize to the target population is of great
relevance. Since training data is often not representative of the target population, standard …
relevance. Since training data is often not representative of the target population, standard …
Stateful offline contextual policy evaluation and learning
We study off-policy evaluation and learning from sequential data in a structured class of
Markov decision processes that arise from repeated interactions with an exogenous …
Markov decision processes that arise from repeated interactions with an exogenous …
Invariant policy learning: A causal perspective
Contextual bandit and reinforcement learning algorithms have been successfully used in
various interactive learning systems such as online advertising, recommender systems, and …
various interactive learning systems such as online advertising, recommender systems, and …
Anytime-valid off-policy inference for contextual bandits
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …
healthcare and the tech industry. They involve online learning algorithms that adaptively …