Semi-parametric efficient policy learning with continuous actions
V Chernozhukov, M Demirer… - Advances in Neural …, 2019 - proceedings.neurips.cc
We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …
on observational data where the data collection policy is unknown and needs to be …
Semi-parametric efficient policy learning with continuous actions
We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …
on observational data where the data collection policy is unknown and needs to be …
Offline multi-action policy learning: Generalization and optimization
In many settings, a decision maker wishes to learn a rule, or policy, that maps from
observable characteristics of an individual to an action. Examples include selecting offers …
observable characteristics of an individual to an action. Examples include selecting offers …
Doubly robust off-policy value and gradient estimation for deterministic policies
Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior
policy to evaluate and learn new policies, is crucial in applications where experimentation is …
policy to evaluate and learn new policies, is crucial in applications where experimentation is …
Generalizing off-policy learning under sample selection bias
T Hatt, D Tschernutter… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press
Learning personalized decision policies that generalize to the target population is of great
relevance. Since training data is often not representative of the target population, standard …
relevance. Since training data is often not representative of the target population, standard …
Doubly robust policy evaluation and learning
We study decision making in environments where the reward is only partially observed, but
can be modeled as a function of an action and an observed context. This setting, known as …
can be modeled as a function of an action and an observed context. This setting, known as …
Balanced policy evaluation and learning
N Kallus - Advances in neural information processing …, 2018 - proceedings.neurips.cc
We present a new approach to the problems of evaluating and learning personalized
decision policies from observational data of past contexts, decisions, and outcomes. Only …
decision policies from observational data of past contexts, decisions, and outcomes. Only …
Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders
Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings
where experimentation is limited, such as healthcare. But, in these very same settings …
where experimentation is limited, such as healthcare. But, in these very same settings …
Policy learning with observational data
In many areas, practitioners seek to use observational data to learn a treatment assignment
policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or …
policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or …
Variance-aware off-policy evaluation with linear function approximation
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …
function approximation, which aims to estimate the value function of a target policy based on …