Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 816 | 2023 |
Acme: A research framework for distributed reinforcement learning MW Hoffman, B Shahriari, J Aslanides, G Barth-Maron, N Momchev, ... arXiv preprint arXiv:2006.00979, 2022 | 237 | 2022 |
Leverage the average: an analysis of kl regularization in reinforcement learning N Vieillard, T Kozuno, B Scherrer, O Pietquin, R Munos, M Geist Advances in Neural Information Processing Systems 33, 12163-12174, 2020 | 105* | 2020 |
Munchausen reinforcement learning N Vieillard, O Pietquin, M Geist Advances in Neural Information Processing Systems 33, 4235-4246, 2020 | 91 | 2020 |
Offline reinforcement learning as anti-exploration S Rezaeifar, R Dadashi, N Vieillard, L Hussenot, O Bachem, O Pietquin, ... Proceedings of the AAAI Conference on Artificial Intelligence 36 (7), 8106-8114, 2022 | 48 | 2022 |
On-policy distillation of language models: Learning from self-generated mistakes R Agarwal, N Vieillard, Y Zhou, P Stanczyk, SR Garea, M Geist, ... The Twelfth International Conference on Learning Representations, 2024 | 40* | 2024 |
Factually consistent summarization via reinforcement learning with textual entailment feedback P Roit, J Ferret, L Shani, R Aharoni, G Cideron, R Dadashi, M Geist, ... arXiv preprint arXiv:2306.00186, 2023 | 38 | 2023 |
Offline reinforcement learning with pseudometric learning R Dadashi, S Rezaeifar, N Vieillard, L Hussenot, O Pietquin, M Geist International Conference on Machine Learning, 2307-2318, 2021 | 38 | 2021 |
Momentum in reinforcement learning N Vieillard, B Scherrer, O Pietquin, M Geist International Conference on Artificial Intelligence and Statistics, 2529-2538, 2020 | 34 | 2020 |
Deep conservative policy iteration N Vieillard, O Pietquin, M Geist Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 6070-6077, 2020 | 27 | 2020 |
Warm: On the benefits of weight averaged reward models A Ramé, N Vieillard, L Hussenot, R Dadashi, G Cideron, O Bachem, ... arXiv preprint arXiv:2401.12187, 2024 | 19 | 2024 |
On connections between constrained optimization and reinforcement learning N Vieillard, O Pietquin, M Geist arXiv preprint arXiv:1910.08476, 2019 | 18 | 2019 |
Implicitly regularized rl with implicit q-values N Vieillard, M Andrychowicz, A Raichuk, O Pietquin, M Geist arXiv preprint arXiv:2108.07041, 2021 | 7 | 2021 |
Kl-entropy-regularized rl with a generative model is minimax optimal T Kozuno, W Yang, N Vieillard, T Kitamura, Y Tang, J Mei, P Ménard, ... arXiv preprint arXiv:2205.14211, 2022 | 6 | 2022 |
Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: theory and practice T Kitamura, T Kozuno, Y Tang, N Vieillard, M Valko, W Yang, J Mei, ... International Conference on Machine Learning, 17135-17175, 2023 | 2 | 2023 |
Training reinforcement learning agents using augmented temporal difference learning MF Geist, N Vieillard, OC Pietquin US Patent App. 17/347,264, 2021 | 1 | 2021 |