A lyapunov-based approach to safe reinforcement learning Y Chow, O Nachum, E Duenez-Guzman, M Ghavamzadeh Advances in neural information processing systems 31, 2018 | 549 | 2018 |
Risk-constrained reinforcement learning with percentile risk criteria Y Chow, M Ghavamzadeh, L Janson, M Pavone Journal of Machine Learning Research 18 (167), 1-51, 2018 | 536 | 2018 |
Algorithms for CVaR optimization in MDPs Y Chow, M Ghavamzadeh Advances in neural information processing systems 27, 2014 | 404 | 2014 |
Risk-sensitive and robust decision-making: a cvar optimization approach Y Chow, A Tamar, S Mannor, M Pavone Advances in neural information processing systems 28, 2015 | 358 | 2015 |
Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections O Nachum, Y Chow, B Dai, L Li Advances in neural information processing systems 32, 2019 | 324 | 2019 |
More robust doubly robust off-policy evaluation M Farajtabar, Y Chow, M Ghavamzadeh International Conference on Machine Learning, 1447-1456, 2018 | 265 | 2018 |
Lyapunov-based safe policy optimization for continuous control Y Chow, O Nachum, A Faust, E Duenez-Guzman, M Ghavamzadeh arXiv preprint arXiv:1901.10031, 2019 | 251 | 2019 |
Algaedice: Policy gradient from arbitrary experience O Nachum, B Dai, I Kostrikov, Y Chow, L Li, D Schuurmans arXiv preprint arXiv:1912.02074, 2019 | 234 | 2019 |
Safe policy improvement by minimizing robust baseline regret M Ghavamzadeh, M Petrik, Y Chow Advances in Neural Information Processing Systems 29, 2016 | 149 | 2016 |
Policy gradient for coherent risk measures A Tamar, Y Chow, M Ghavamzadeh, S Mannor Advances in neural information processing systems 28, 2015 | 135 | 2015 |
Coindice: Off-policy confidence interval estimation B Dai, O Nachum, Y Chow, L Li, C Szepesvári, D Schuurmans Advances in neural information processing systems 33, 9398-9411, 2020 | 81 | 2020 |
Sequential decision making with coherent risk A Tamar, Y Chow, M Ghavamzadeh, S Mannor IEEE transactions on automatic control 62 (7), 3323-3338, 2016 | 80 | 2016 |
A framework for time-consistent, risk-sensitive model predictive control: Theory and algorithms S Singh, Y Chow, A Majumdar, M Pavone IEEE Transactions on Automatic Control 64 (7), 2905-2912, 2018 | 67 | 2018 |
Online modified greedy algorithm for storage control under uncertainty J Qin, Y Chow, J Yang, R Rajagopal IEEE Transactions on Power Systems 31 (3), 1729-1743, 2015 | 63 | 2015 |
CAQL: Continuous action Q-learning M Ryu, Y Chow, R Anderson, C Tjandraatmadja, C Boutilier arXiv preprint arXiv:1909.12397, 2019 | 53 | 2019 |
Weighted SGD for Regression with Randomized Preconditioning J Yang, YL Chow, C Ré, MW Mahoney Journal of Machine Learning Research 18 (211), 1-43, 2018 | 52 | 2018 |
Latent bandits revisited J Hong, B Kveton, M Zaheer, Y Chow, A Ahmed, C Boutilier Advances in Neural Information Processing Systems 33, 13423-13433, 2020 | 48 | 2020 |
A framework for time-consistent, risk-averse model predictive control: Theory and algorithms YL Chow, M Pavone 2014 American Control Conference, 4204-4211, 2014 | 44 | 2014 |
Distributed online modified greedy algorithm for networked storage operation under uncertainty J Qin, Y Chow, J Yang, R Rajagopal IEEE Transactions on Smart Grid 7 (2), 1106-1118, 2015 | 42 | 2015 |
Path consistency learning in tsallis entropy regularized mdps Y Chow, O Nachum, M Ghavamzadeh International conference on machine learning, 979-988, 2018 | 37 | 2018 |