Transfer in deep reinforcement learning using successor features and generalised policy improvement A Barreto, D Borsa, J Quan, T Schaul, D Silver, M Hessel, D Mankowitz, ... International Conference on Machine Learning, 501-510, 2018 | 188 | 2018 |
Fast reinforcement learning with generalized policy updates A Barreto, S Hou, D Borsa, D Silver, D Precup Proceedings of the National Academy of Sciences 117 (48), 30079-30087, 2020 | 138 | 2020 |
Universal successor features approximators D Borsa, A Barreto, J Quan, D Mankowitz, R Munos, H Van Hasselt, ... arXiv preprint arXiv:1812.07626, 2018 | 124 | 2018 |
The option keyboard: Combining skills in reinforcement learning A Barreto, D Borsa, S Hou, G Comanici, E Aygün, P Hamel, D Toyama, ... Advances in Neural Information Processing Systems 32, 2019 | 97 | 2019 |
Detecting disease outbreaks in mass gatherings using Internet data E Yom-Tov, D Borsa, IJ Cox, RA McKendry Journal of medical Internet research 16 (6), e3156, 2014 | 74 | 2014 |
Observational learning by reinforcement learning D Borsa, B Piot, R Munos, O Pietquin arXiv preprint arXiv:1706.06617, 2017 | 72 | 2017 |
Ray interference: a source of plateaus in deep reinforcement learning T Schaul, D Borsa, J Modayil, R Pascanu arXiv preprint arXiv:1904.11455, 2019 | 65 | 2019 |
The termination critic A Harutyunyan, W Dabney, D Borsa, N Heess, R Munos, D Precup arXiv preprint arXiv:1902.09996, 2019 | 57 | 2019 |
Learning shared representations in multi-task reinforcement learning D Borsa, T Graepel, J Shawe-Taylor arXiv preprint arXiv:1603.02041, 2016 | 47 | 2016 |
Expected eligibility traces H van Hasselt, S Madjiheurem, M Hessel, D Silver, A Barreto, D Borsa Proceedings of the AAAI conference on artificial intelligence 35 (11), 9997 …, 2021 | 44 | 2021 |
Training deep neural nets to aggregate crowdsourced responses A Gaunt, D Borsa, Y Bachrach Proceedings of the Thirty-Second Conference on Uncertainty in Artificial …, 2016 | 33 | 2016 |
Automatic identification of Web-based risk markers for health events E Yom-Tov, D Borsa, AC Hayward, RA McKendry, IJ Cox Journal of medical Internet research 17 (1), e29, 2015 | 33 | 2015 |
When should agents explore? M Pislar, D Szepesvari, G Ostrovski, D Borsa, T Schaul arXiv preprint arXiv:2108.11811, 2021 | 26 | 2021 |
Adapting behaviour for learning progress T Schaul, D Borsa, D Ding, D Szepesvari, G Ostrovski, W Dabney, ... arXiv preprint arXiv:1912.06910, 2019 | 16 | 2019 |
Temporal difference uncertainties as a signal for exploration S Flennerhag, JX Wang, P Sprechmann, F Visin, A Galashov, ... arXiv preprint arXiv:2010.02255, 2020 | 15 | 2020 |
Return-based scaling: Yet another normalisation trick for deep rl T Schaul, G Ostrovski, I Kemaev, D Borsa arXiv preprint arXiv:2105.05347, 2021 | 13 | 2021 |
Conditional importance sampling for off-policy learning M Rowland, A Harutyunyan, H Hasselt, D Borsa, T Schaul, R Munos, ... International Conference on Artificial Intelligence and Statistics, 45-55, 2020 | 11 | 2020 |
General non-linear bellman equations H van Hasselt, J Quan, M Hessel, Z Xu, D Borsa, A Barreto arXiv preprint arXiv:1907.03687, 2019 | 11 | 2019 |
Model-value inconsistency as a signal for epistemic uncertainty A Filos, E Vértes, Z Marinho, G Farquhar, D Borsa, A Friesen, ... arXiv preprint arXiv:2112.04153, 2021 | 10 | 2021 |
Generalised policy improvement with geometric policy composition S Thakoor, M Rowland, D Borsa, W Dabney, R Munos, A Barreto International Conference on Machine Learning, 21272-21307, 2022 | 6 | 2022 |