相关文章- 学术资源搜索

Semi-parametric efficient policy learning with continuous actions

V Chernozhukov, M Demirer… - Advances in Neural …, 2019 - proceedings.neurips.cc

We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …

被引用次数：40 相关文章所有 4 个版本

[PDF] arxiv.org

Semi-parametric efficient policy learning with continuous actions

M Demirer, V Syrgkanis, G Lewis… - arXiv preprint arXiv …, 2019 - arxiv.org

We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …

被引用次数：19 相关文章所有 12 个版本

[PDF] arxiv.org

Offline multi-action policy learning: Generalization and optimization

Z Zhou, S Athey, S Wager - Operations Research, 2023 - pubsonline.informs.org

In many settings, a decision maker wishes to learn a rule, or policy, that maps from
observable characteristics of an individual to an action. Examples include selecting offers …

被引用次数：178 相关文章所有 11 个版本

[PDF] neurips.cc

Doubly robust off-policy value and gradient estimation for deterministic policies

N Kallus, M Uehara - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior
policy to evaluate and learn new policies, is crucial in applications where experimentation is …

被引用次数：13 相关文章所有 6 个版本

[PDF] mlr.press

Generalizing off-policy learning under sample selection bias

T Hatt, D Tschernutter… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press

Learning personalized decision policies that generalize to the target population is of great
relevance. Since training data is often not representative of the target population, standard …

被引用次数：21 相关文章所有 6 个版本

[PDF] arxiv.org

Doubly robust policy evaluation and learning

M Dudík, J Langford, L Li - arXiv preprint arXiv:1103.4601, 2011 - arxiv.org

We study decision making in environments where the reward is only partially observed, but
can be modeled as a function of an action and an observed context. This setting, known as …

被引用次数：877 相关文章所有 16 个版本

[PDF] neurips.cc

Balanced policy evaluation and learning

N Kallus - Advances in neural information processing …, 2018 - proceedings.neurips.cc

We present a new approach to the problems of evaluating and learning personalized
decision policies from observational data of past contexts, decisions, and outcomes. Only …

被引用次数：294 相关文章所有 7 个版本

[PDF] mlr.press

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

A Bennett, N Kallus, L Li… - … Conference on Artificial …, 2021 - proceedings.mlr.press

Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings
where experimentation is limited, such as healthcare. But, in these very same settings …

被引用次数：48 相关文章所有 6 个版本

[PDF] arxiv.org

Policy learning with observational data

S Athey, S Wager - Econometrica, 2021 - Wiley Online Library

In many areas, practitioners seek to use observational data to learn a treatment assignment
policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or …

被引用次数：438 相关文章所有 13 个版本

[PDF] neurips.cc

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

被引用次数：37 相关文章所有 11 个版本