More robust doubly robust off-policy evaluation

M Farajtabar, Y Chow… - … on Machine Learning, 2018 - proceedings.mlr.press
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

[PDF][PDF] More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, Y Chow, M Ghavamzadeh - mohammadghavamzadeh.github.io
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, Y Chow, M Ghavamzadeh - arXiv e-prints, 2018 - ui.adsabs.harvard.edu
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

[PDF][PDF] More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, Y Chow, M Ghavamzadeh - proceedings.mlr.press
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, Y Chow, M Ghavamzadeh - arXiv preprint arXiv:1802.03493, 2018 - arxiv.org
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, M Ghavamzadeh, Y Chow - research.google
We study the problem of off-policy value evaluation in reinforcement learning (RL), where
one aims to estimate the value of a new policy based on data collected by a different policy …

More Robust Doubly Robust Off-policy Evaluation

M Farajtabar, M Ghavamzadeh, Y Chow - research.google
We study the problem of off-policy value evaluation in reinforcement learning (RL), where
one aims to estimate the value of a new policy based on data collected by a different policy …