Transformers: State-of-the-art natural language processing T Wolf, L Debut, V Sanh, J Chaumond, C Delangue, A Moi, P Cistac, ... Proceedings of the 2020 conference on empirical methods in natural language …, 2020 | 14187* | 2020 |
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1329 | 2023 |
Datasets: A Community Library for Natural Language Processing Q Lhoest, A Villanova del Moral, Y Jernite, A Thakur, P von Platen, S Patil, ... Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 474* | 2021 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022 | 132 | 2022 |
Distributed deep learning in open collaborations M Diskin, A Bukhtiyarov, M Ryabinin, L Saulnier, A Sinitsin, D Popov, ... Advances in Neural Information Processing Systems 34, 7879-7897, 2021 | 44 | 2021 |
Evaluate & evaluation on the hub: better best practices for data and model measurements L Von Werra, L Tunstall, A Thakur, S Luccioni, T Thrush, A Piktus, F Marty, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 19 | 2022 |
Training transformers together A Borzunov, M Ryabinin, T Dettmers, Q Lhoest, L Saulnier, M Diskin, ... NeurIPS 2021 Competitions and Demonstrations Track, 335-342, 2022 | 9 | 2022 |
Croissant: A Metadata Format for ML-Ready Datasets M Akhtar, O Benjelloun, C Conforti, P Gijsbers, J Giner-Miguelez, N Jain, ... Proceedings of the Eighth Workshop on Data Management for End-to-End Machine …, 2024 | 5 | 2024 |
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages CC Emezue, S Gandhi, L Tunstall, A Abid, J Meyer, Q Lhoest, P Allen, ... arXiv preprint arXiv:2303.12582, 2023 | | 2023 |
Actes de la conférence CAID 2020 F de Vieilleville, S May, A Lagrange, A Dupuis, R Ruiloba, FN Mboula, ... | | 2021 |