Semi-parametric efficient policy learning with continuous actions

V Chernozhukov, M Demirer… - Advances in Neural …, 2019 - proceedings.neurips.cc
We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …

Semi-parametric efficient policy learning with continuous actions

M Demirer, V Syrgkanis, G Lewis… - arXiv preprint arXiv …, 2019 - arxiv.org
We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …

Offline multi-action policy learning: Generalization and optimization

Z Zhou, S Athey, S Wager - Operations Research, 2023 - pubsonline.informs.org
In many settings, a decision maker wishes to learn a rule, or policy, that maps from
observable characteristics of an individual to an action. Examples include selecting offers …

Doubly robust off-policy value and gradient estimation for deterministic policies

N Kallus, M Uehara - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior
policy to evaluate and learn new policies, is crucial in applications where experimentation is …

Generalizing off-policy learning under sample selection bias

T Hatt, D Tschernutter… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press
Learning personalized decision policies that generalize to the target population is of great
relevance. Since training data is often not representative of the target population, standard …

Doubly robust policy evaluation and learning

M Dudík, J Langford, L Li - arXiv preprint arXiv:1103.4601, 2011 - arxiv.org
We study decision making in environments where the reward is only partially observed, but
can be modeled as a function of an action and an observed context. This setting, known as …

Balanced policy evaluation and learning

N Kallus - Advances in neural information processing …, 2018 - proceedings.neurips.cc
We present a new approach to the problems of evaluating and learning personalized
decision policies from observational data of past contexts, decisions, and outcomes. Only …

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

A Bennett, N Kallus, L Li… - … Conference on Artificial …, 2021 - proceedings.mlr.press
Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings
where experimentation is limited, such as healthcare. But, in these very same settings …

Policy learning with observational data

S Athey, S Wager - Econometrica, 2021 - Wiley Online Library
In many areas, practitioners seek to use observational data to learn a treatment assignment
policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or …

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …