Transformers: State-of-the-art natural language processing T Wolf, L Debut, V Sanh, J Chaumond, C Delangue, A Moi, P Cistac, ... Proceedings of the 2020 conference on empirical methods in natural language …, 2020 | 14185* | 2020 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter V Sanh, L Debut, J Chaumond, T Wolf arXiv preprint arXiv:1910.01108, 2019 | 7078 | 2019 |
Multitask Prompted Training Enables Zero-Shot Task Generalization V Sanh, A Webson, C Raffel, SH Bach, L Sutawika, Z Alyafeai, A Chaffin, ... arXiv preprint arXiv:2110.08207, 2021 | 1372 | 2021 |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... arXiv preprint arXiv:2211.05100, 2022 | 1329 | 2022 |
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents T Wolf, V Sanh, J Chaumond, C Delangue arXiv preprint arXiv:1901.08149, 2019 | 519 | 2019 |
Datasets: A Community Library for Natural Language Processing Q Lhoest, AV del Moral, Y Jernite, A Thakur, P von Platen, S Patil, ... arXiv preprint arXiv:2109.02846, 2021 | 451* | 2021 |
Movement pruning: Adaptive sparsity by fine-tuning V Sanh, T Wolf, A Rush Advances in Neural Information Processing Systems 33, 20378-20389, 2020 | 397 | 2020 |
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts SH Bach, V Sanh, ZX Yong, A Webson, C Raffel, NV Nayak, A Sharma, ... arXiv preprint arXiv:2202.01279, 2022 | 265 | 2022 |
A hierarchical multi-task approach for learning embeddings from semantic tasks V Sanh, T Wolf, S Ruder Proceedings of the AAAI Conference on Artificial Intelligence 33, 6949-6956, 2019 | 260 | 2019 |
Block Pruning For Faster Transformers F Lagunas, E Charlaix, V Sanh, AM Rush arXiv preprint arXiv:2109.04838, 2021 | 177 | 2021 |
Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models H Strobelt, A Webson, V Sanh, B Hoover, J Beyer, H Pfister, AM Rush IEEE transactions on visualization and computer graphics, 2022 | 123 | 2022 |
OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents H Laurençon, L Saulnier, L Tronchon, S Bekman, A Singh, A Lozhkov, ... arXiv preprint arXiv:2306.16527, 2023 | 118 | 2023 |
Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference T Tambe, C Hooper, L Pentecost, T Jia, EY Yang, M Donato, V Sanh, ... MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture …, 2021 | 100* | 2021 |
Learning from others' mistakes: Avoiding dataset biases without modeling them V Sanh, T Wolf, Y Belinkov, AM Rush arXiv preprint arXiv:2012.01300, 2020 | 89 | 2020 |
What Language Model to Train if You Have One Million GPU Hours? T Le Scao, T Wang, D Hesslow, L Saulnier, S Bekman, MS Bari, ... Challenges {\&, 2022 | 86 | 2022 |
Low-Complexity Probing via Finding Subnetworks S Cao, V Sanh, AM Rush arXiv preprint arXiv:2104.03514, 2021 | 37 | 2021 |
What matters when building vision-language models? H Laurençon, L Tronchon, M Cord, V Sanh arXiv preprint arXiv:2405.02246, 2024 | 31 | 2024 |
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning PA Utama, NS Moosavi, V Sanh, I Gurevych arXiv preprint arXiv:2109.04144, 2021 | 29 | 2021 |
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset H Laurençon, L Tronchon, V Sanh arXiv preprint arXiv:2403.09029, 2024 | 4 | 2024 |