Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1360 | 2023 |
MTEB: Massive text embedding benchmark N Muennighoff, N Tazi, L Magne, N Reimers EACL 2023, 2022 | 254 | 2022 |
Scaling Data-Constrained Language Models N Muennighoff, AM Rush, B Barak, TL Scao, A Piktus, N Tazi, S Pyysalo, ... NeurIPS 2023, 2023 | 120 | 2023 |
Starcoder 2 and the stack v2: The next generation A Lozhkov, R Li, LB Allal, F Cassano, J Lamy-Poirier, N Tazi, A Tang, ... arXiv preprint arXiv:2402.19173, 2024 | 59 | 2024 |
Fingpt: Large generative models for a small language R Luukkonen, V Komulainen, J Luoma, A Eskelinen, J Kanerva, ... arXiv preprint arXiv:2311.05640, 2023 | 21 | 2023 |
Masader Plus: A New Interface for Exploring+ 500 Arabic NLP Datasets Y Altaher, A Fadel, M Alotaibi, M Alyazidi, M Al-Mutairi, M Aldhbuiub, ... arXiv preprint arXiv:2208.00932, 2022 | 2 | 2022 |