Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning

N Kallus, M Uehara - Advances in neural information …, 2019 - proceedings.neurips.cc
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

N Kallus, M Uehara - arXiv preprint arXiv:1906.03735, 2019 - arxiv.org
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

N Kallus, M Uehara - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

N Kallus, M Uehara - Advances in neural information processing systems, 2019 - par.nsf.gov
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

N Kallus, M Uehara - openreview.net
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

[PDF][PDF] Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

N Kallus, M Uehara, MA Cambrdige - papers.neurips.cc
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

N Kallus, M Uehara - arXiv e-prints, 2019 - ui.adsabs.harvard.edu
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning

N Kallus, M Uehara - Proceedings of the 33rd International Conference …, 2019 - dl.acm.org
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …