所有版本 - 学术资源搜索

More robust doubly robust off-policy evaluation

M Farajtabar, Y Chow… - … on Machine Learning, 2018 - proceedings.mlr.press

We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

被引用次数：273 相关文章

[PDF] github.io

[PDF][PDF] More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, Y Chow, M Ghavamzadeh - mohammadghavamzadeh.github.io

We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, Y Chow, M Ghavamzadeh - arXiv e-prints, 2018 - ui.adsabs.harvard.edu

We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

[PDF] mlr.press

[PDF][PDF] More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, Y Chow, M Ghavamzadeh - proceedings.mlr.press

We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …