Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …

[PDF][PDF] Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

N Jiang, J Huang - papers.neurips.cc
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization Page 1 Minimax
Value Interval for Off-Policy Evaluation and Policy Optimization Nan Jiang Department of …

Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information Processing …, 2020 - experts.illinois.edu
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …

Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Proceedings of the 34th International Conference on …, 2020 - dl.acm.org
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …

[PDF][PDF] Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

N Jiang, J Huang - proceedings.neurips.cc
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization Page 1 Minimax
Value Interval for Off-Policy Evaluation and Policy Optimization Nan Jiang Department of …

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

N Jiang, J Huang - arXiv preprint arXiv:2002.02081, 2020 - arxiv.org
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

N Jiang, J Huang - arXiv e-prints, 2020 - ui.adsabs.harvard.edu
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …