Double reinforcement learning for efficient off-policy evaluation in markov decision processes N Kallus, M Uehara Journal of Machine Learning Research 21 (167), 1-63, 2020 | 194 | 2020 |
Minimax weight and q-function learning for off-policy evaluation M Uehara, J Huang, N Jiang International Conference on Machine Learning, 9659-9668, 2020 | 181 | 2020 |
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage M Uehara, W Sun International Conference on Learning Representations, 2022 | 139 | 2022 |
Representation Learning for Online and Offline RL in Low-rank MDPs M Uehara, X Zhang, W Sun International Conference on Learning Representations, 2022 | 131 | 2022 |
Generative adversarial nets from a density ratio estimation perspective M Uehara, I Sato, M Suzuki, K Nakayama, Y Matsuo arXiv preprint arXiv:1610.02920, 2016 | 103 | 2016 |
Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning N Kallus, M Uehara Operations Research 70 (6), 3282-3302, 2022 | 97* | 2022 |
Mitigating covariate shift in imitation learning via offline data with partial coverage J Chang, M Uehara, D Sreenivas, R Kidambi, W Sun Advances in Neural Information Processing Systems 34, 965-979, 2021 | 82 | 2021 |
Efficient reinforcement learning in block mdps: A model-free representation learning approach X Zhang, Y Song, M Uehara, M Wang, A Agarwal, W Sun International Conference on Machine Learning, 26517-26547, 2022 | 61 | 2022 |
Causal inference under unmeasured confounding with negative controls: A minimax learning approach N Kallus, X Mao, M Uehara arXiv preprint arXiv:2103.14029, 2021 | 61 | 2021 |
Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency M Uehara, M Imaizumi, N Jiang, N Kallus, W Sun, T Xie arXiv preprint arXiv:2102.02981, 2021 | 60 | 2021 |
Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning N Kallus, M Uehara Advances in Neural Information Processing Systems 32, 2019 | 54 | 2019 |
A review of off-policy evaluation in reinforcement learning M Uehara, C Shi, N Kallus arXiv preprint arXiv:2212.06355, 2022 | 45 | 2022 |
Off-policy evaluation and learning for external validity under a covariate shift M Uehara, M Kato, S Yasui Advances in Neural Information Processing Systems 33, 49-61, 2020 | 45* | 2020 |
Statistically efficient off-policy policy gradients N Kallus, M Uehara Proceedings of the 37th International Conference on Machine Learning, 5089-5100, 2020 | 42 | 2020 |
PAC Reinforcement Learning for Predictive State Representations W Zhan, M Uehara, W Sun, JD Lee International Conference on Learning Representations, 2023 | 38 | 2023 |
A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes C Shi, M Uehara, J Huang, N Jiang International Conference on Machine Learning, 20057-20094, 2022 | 36 | 2022 |
Provably efficient reinforcement learning in partially observable dynamical systems M Uehara, A Sekhari, JD Lee, N Kallus, W Sun Advances in Neural Information Processing Systems 35, 578-592, 2022 | 34 | 2022 |
Localized debiased machine learning: Efficient inference on quantile treatment effects and beyond N Kallus, X Mao, M Uehara Journal of Machine Learning Research 25 (16), 1-59, 2024 | 30* | 2024 |
Optimal off-policy evaluation from multiple logging policies N Kallus, Y Saito, M Uehara International Conference on Machine Learning, 5247-5256, 2021 | 29 | 2021 |
Provable offline reinforcement learning with human feedback W Zhan, M Uehara, N Kallus, JD Lee, W Sun ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023 | 25 | 2023 |