Bandit algorithms T Lattimore, C Szepesvári Cambridge University Press, 2020 | 2760 | 2020 |
Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning C Dann, T Lattimore, E Brunskill Advances in Neural Information Processing Systems 30, 2017 | 304 | 2017 |
Causal bandits: Learning good interventions via causal inference F Lattimore, T Lattimore, MD Reid Advances in neural information processing systems 29, 2016 | 264* | 2016 |
Degenerate feedback loops in recommender systems R Jiang, S Chiappa, T Lattimore, A György, P Kohli Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 383-390, 2019 | 223 | 2019 |
Learning with good feature representations in bandits and in rl with a generative model T Lattimore, C Szepesvari, G Weisz International conference on machine learning, 5662-5670, 2020 | 182 | 2020 |
Behaviour suite for reinforcement learning I Osband, Y Doron, M Hessel, J Aslanides, E Sezener, A Saraiva, ... arXiv preprint arXiv:1908.03568, 2019 | 178 | 2019 |
PAC bounds for discounted MDPs T Lattimore, M Hutter Algorithmic Learning Theory: 23rd International Conference, ALT 2012, Lyon …, 2012 | 140 | 2012 |
The end of optimism? an asymptotic analysis of finite-armed linear bandits T Lattimore, C Szepesvari Artificial Intelligence and Statistics, 728-737, 2017 | 134 | 2017 |
Conservative bandits Y Wu, R Shariff, T Lattimore, C Szepesvári International Conference on Machine Learning, 1254-1262, 2016 | 122 | 2016 |
On explore-then-commit strategies A Garivier, T Lattimore, E Kaufmann Advances in Neural Information Processing Systems 29, 2016 | 117 | 2016 |
A geometric perspective on optimal representations for reinforcement learning M Bellemare, W Dabney, R Dadashi, A Ali Taiga, PS Castro, N Le Roux, ... Advances in neural information processing systems 32, 2019 | 99 | 2019 |
Model selection in contextual stochastic bandit problems A Pacchiano, M Phan, Y Abbasi Yadkori, A Rao, J Zimmert, T Lattimore, ... Advances in Neural Information Processing Systems 33, 10328-10337, 2020 | 94 | 2020 |
Garbage in, reward out: Bootstrapping exploration in multi-armed bandits B Kveton, C Szepesvari, S Vaswani, Z Wen, T Lattimore, M Ghavamzadeh International Conference on Machine Learning, 3601-3610, 2019 | 77 | 2019 |
Toprank: A practical algorithm for online stochastic ranking T Lattimore, B Kveton, S Li, C Szepesvari Advances in Neural Information Processing Systems 31, 2018 | 71 | 2018 |
Linear bandits with stochastic delayed feedback C Vernade, A Carpentier, T Lattimore, G Zappella, B Ermis, M Brueckner International Conference on Machine Learning, 9712-9721, 2020 | 70 | 2020 |
The sample-complexity of general reinforcement learning T Lattimore, M Hutter, P Sunehag International Conference on Machine Learning, 28-36, 2013 | 70 | 2013 |
Near-optimal PAC bounds for discounted MDPs T Lattimore, M Hutter Theoretical Computer Science 558, 125-143, 2014 | 69 | 2014 |
Bounded Regret for Finite-Armed Structured Bandits T Lattimore, R Munos | 68 | 2014 |
Adaptive exploration in linear contextual bandit B Hao, T Lattimore, C Szepesvari International Conference on Artificial Intelligence and Statistics, 3536-3545, 2020 | 65 | 2020 |
An information-theoretic approach to minimax regret in partial monitoring T Lattimore, C Szepesvári Conference on Learning Theory, 2111-2139, 2019 | 64 | 2019 |