Bootstrap your own latent-a new approach to self-supervised learning JB Grill, F Strub, F Altché, C Tallec, P Richemond, E Buchatskaya, ... Advances in neural information processing systems 33, 21271-21284, 2020 | 5982 | 2020 |
A distributional perspective on reinforcement learning MG Bellemare, W Dabney, R Munos International conference on machine learning, 449-458, 2017 | 1719 | 2017 |
Unifying count-based exploration and intrinsic motivation M Bellemare, S Srinivasan, G Ostrovski, T Schaul, D Saxton, R Munos Advances in neural information processing systems 29, 2016 | 1654 | 2016 |
Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures L Espeholt, H Soyer, R Munos, K Simonyan, V Mnih, T Ward, Y Doron, ... International conference on machine learning, 1407-1416, 2018 | 1588 | 2018 |
Learning to reinforcement learn JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer, JZ Leibo, R Munos, ... arXiv preprint arXiv:1611.05763, 2016 | 1027 | 2016 |
Sample efficient actor-critic with experience replay Z Wang, V Bapst, N Heess, V Mnih, R Munos, K Kavukcuoglu, ... arXiv preprint arXiv:1611.01224, 2016 | 986 | 2016 |
Best arm identification in multi-armed bandits JY Audibert, S Bubeck COLT-23th Conference on learning theory-2010, 13 p., 2010 | 922 | 2010 |
Minimax regret bounds for reinforcement learning MG Azar, I Osband, R Munos International conference on machine learning, 263-272, 2017 | 814 | 2017 |
Distributional reinforcement learning with quantile regression W Dabney, M Rowland, M Bellemare, R Munos Proceedings of the AAAI conference on artificial intelligence 32 (1), 2018 | 792 | 2018 |
Exploration–exploitation tradeoff using variance estimates in multi-armed bandits JY Audibert, R Munos, C Szepesvári Theoretical Computer Science 410 (19), 1876-1902, 2009 | 773 | 2009 |
Thompson sampling: An asymptotically optimal finite-time analysis E Kaufmann, N Korda, R Munos International conference on algorithmic learning theory, 199-213, 2012 | 769 | 2012 |
Safe and efficient off-policy reinforcement learning R Munos, T Stepleton, A Harutyunyan, M Bellemare Advances in neural information processing systems 29, 2016 | 714 | 2016 |
Count-based exploration with neural density models G Ostrovski, MG Bellemare, A Oord, R Munos International conference on machine learning, 2721-2730, 2017 | 705 | 2017 |
Finite-Time Bounds for Fitted Value Iteration. R Munos, C Szepesvári Journal of Machine Learning Research 9 (5), 2008 | 624 | 2008 |
Automated curriculum learning for neural networks A Graves, MG Bellemare, J Menick, R Munos, K Kavukcuoglu international conference on machine learning, 1311-1320, 2017 | 610 | 2017 |
Successor features for transfer in reinforcement learning A Barreto, W Dabney, R Munos, JJ Hunt, T Schaul, HP van Hasselt, ... Advances in neural information processing systems 30, 2017 | 607 | 2017 |
Pure exploration in multi-armed bandits problems S Bubeck, R Munos, G Stoltz Algorithmic Learning Theory: 20th International Conference, ALT 2009, Porto …, 2009 | 603 | 2009 |
Implicit quantile networks for distributional reinforcement learning W Dabney, G Ostrovski, D Silver, R Munos International conference on machine learning, 1096-1105, 2018 | 570 | 2018 |
Modification of UCT with Patterns in Monte-Carlo Go S Gelly, Y Wang, R Munos, O Teytaud INRIA, 2006 | 540 | 2006 |
Recurrent experience replay in distributed reinforcement learning S Kapturowski, G Ostrovski, J Quan, R Munos, W Dabney International conference on learning representations, 2018 | 533 | 2018 |