Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 633 | 2024 |
Hyperparameter selection for offline reinforcement learning TL Paine, C Paduraru, A Michi, C Gulcehre, K Zolna, A Novikov, Z Wang, ... arXiv preprint arXiv:2007.09055, 2020 | 167 | 2020 |
Faster sorting algorithms discovered using deep reinforcement learning DJ Mankowitz, A Michi, A Zhernov, M Gelmi, M Selvi, C Paduraru, ... Nature 618 (7964), 257-263, 2023 | 164 | 2023 |
Nash learning from human feedback R Munos, M Valko, D Calandriello, MG Azar, M Rowland, ZD Guo, Y Tang, ... arXiv preprint arXiv:2312.00886, 2023 | 78 | 2023 |
A generic human–machine annotation framework based on dynamic cooperative learning Y Zhang, A Michi, J Wagner, E André, B Schuller, F Weninger IEEE transactions on cybernetics 50 (3), 1230-1239, 2019 | 19 | 2019 |
Bond: Aligning llms with best-of-n distillation PG Sessa, R Dadashi, L Hussenot, J Ferret, N Vieillard, A Ramé, ... arXiv preprint arXiv:2407.14622, 2024 | 13 | 2024 |
Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning K Wang, R Kidambi, R Sullivan, A Agarwal, C Dann, A Michi, M Gelmi, ... arXiv preprint arXiv:2407.15762, 2024 | 7 | 2024 |
Towards practical reinforcement learning for tokamak magnetic control BD Tracey, A Michi, Y Chervonyi, I Davies, C Paduraru, N Lazic, F Felici, ... Fusion Engineering and Design 200, 114161, 2024 | 5 | 2024 |
Towards practical reinforcement learning for tokamak magnetic control BD Tracey, A Michi, Y Chervonyi, I Davies, C Paduraru, N Lazic, F Felici, ... arXiv preprint arXiv:2307.11546, 2023 | 4 | 2023 |
OFFLINE HYPERPARAMETER SELECTION FOR OFFLINE REINFORCEMENT LEARNING T Le Paine, C Paduraru, A Michi, C Gulcehre, K Zołna, A Novikov, ... | | |