Transfer in reinforcement learning: a framework and a survey A Lazaric Reinforcement Learning: State-of-the-Art, 143-173, 2012 | 362 | 2012 |
Best arm identification: A unified approach to fixed budget and fixed confidence V Gabillon, M Ghavamzadeh, A Lazaric Advances in Neural Information Processing Systems 25, 2012 | 344 | 2012 |
Linear thompson sampling revisited M Abeille, A Lazaric Artificial Intelligence and Statistics, 176-184, 2017 | 262 | 2017 |
Mastering visual continuous control: Improved data-augmented reinforcement learning D Yarats, R Fergus, A Lazaric, L Pinto arXiv preprint arXiv:2107.09645, 2021 | 244 | 2021 |
Learning near optimal policies with low inherent bellman error A Zanette, A Lazaric, M Kochenderfer, E Brunskill International Conference on Machine Learning, 10978-10989, 2020 | 223 | 2020 |
Reinforcement learning with prototypical representations D Yarats, R Fergus, A Lazaric, L Pinto International Conference on Machine Learning, 11920-11931, 2021 | 206 | 2021 |
Best-arm identification in linear bandits M Soare, A Lazaric, R Munos Advances in Neural Information Processing Systems 27, 2014 | 206 | 2014 |
Transfer of samples in batch reinforcement learning A Lazaric, M Restelli, A Bonarini Proceedings of the 25th international conference on Machine learning, 544-551, 2008 | 205 | 2008 |
Reinforcement learning in continuous action spaces through sequential monte carlo methods A Lazaric, M Restelli, A Bonarini Advances in neural information processing systems 20, 2007 | 194 | 2007 |
Risk-aversion in multi-armed bandits A Sani, A Lazaric, R Munos Advances in neural information processing systems 25, 2012 | 184 | 2012 |
Bayesian multi-task reinforcement learning A Lazaric, M Ghavamzadeh ICML-27th international conference on machine learning, 599-606, 2010 | 142 | 2010 |
Frequentist regret bounds for randomized least-squares value iteration A Zanette, D Brandfonbrener, E Brunskill, M Pirotta, A Lazaric International Conference on Artificial Intelligence and Statistics, 1954-1964, 2020 | 141 | 2020 |
Reinforcement learning of pomdps using spectral methods K Azizzadenesheli, A Lazaric, A Anandkumar Conference on Learning Theory, 193-256, 2016 | 135 | 2016 |
Finite-sample analysis of least-squares policy iteration A Lazaric, M Ghavamzadeh, R Munos Journal of Machine Learning Research 13, 3041-3074, 2012 | 130 | 2012 |
Upper-confidence-bound algorithms for active learning in multi-armed bandits A Carpentier, A Lazaric, M Ghavamzadeh, R Munos, P Auer International Conference on Algorithmic Learning Theory, 189-203, 2011 | 127 | 2011 |
Multi-bandit best arm identification V Gabillon, M Ghavamzadeh, A Lazaric, S Bubeck Advances in Neural Information Processing Systems 24, 2011 | 124 | 2011 |
Sequential transfer in multi-armed bandit with finite set of models A Lazaric, E Brunskill Advances in Neural Information Processing Systems 26, 2013 | 116 | 2013 |
Efficient bias-span-constrained exploration-exploitation in reinforcement learning R Fruit, M Pirotta, A Lazaric, R Ortner International Conference on Machine Learning, 1578-1586, 2018 | 112 | 2018 |
Improved regret bounds for thompson sampling in linear quadratic control problems M Abeille, A Lazaric International Conference on Machine Learning, 1-9, 2018 | 105 | 2018 |
A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities N Gatti, A Lazaric, F Trovo Proceedings of the 13th ACM Conference on Electronic Commerce, 605-622, 2012 | 95 | 2012 |