Enriching word vectors with subword information P Bojanowski, E Grave, A Joulin, T Mikolov Transactions of the association for computational linguistics 5, 135-146, 2017 | 12450 | 2017 |
Llama: Open and efficient foundation language models H Touvron, T Lavril, G Izacard, X Martinet, MA Lachaux, T Lacroix, ... arXiv preprint arXiv:2302.13971, 2023 | 6674 | 2023 |
Bag of tricks for efficient text classification A Joulin, E Grave, P Bojanowski, T Mikolov arXiv preprint arXiv:1607.01759, 2016 | 6016 | 2016 |
Unsupervised cross-lingual representation learning at scale A Conneau, K Khandelwal, N Goyal, V Chaudhary, G Wenzek, F Guzmán, ... arXiv preprint arXiv:1911.02116, 2019 | 5530 | 2019 |
Learning word vectors for 157 languages E Grave, P Bojanowski, P Gupta, A Joulin, T Mikolov arXiv preprint arXiv:1802.06893, 2018 | 1798 | 2018 |
Advances in pre-training distributed word representations T Mikolov, E Grave, P Bojanowski, C Puhrsch, A Joulin arXiv preprint arXiv:1712.09405, 2017 | 1722 | 2017 |
Fasttext. zip: Compressing text classification models A Joulin, E Grave, P Bojanowski, M Douze, H Jégou, T Mikolov arXiv preprint arXiv:1612.03651, 2016 | 1568 | 2016 |
Parseval networks: Improving robustness to adversarial examples M Cisse, P Bojanowski, E Grave, Y Dauphin, N Usunier International conference on machine learning, 854-863, 2017 | 880 | 2017 |
Leveraging passage retrieval with generative models for open domain question answering G Izacard, E Grave arXiv preprint arXiv:2007.01282, 2020 | 805 | 2020 |
Beyond english-centric multilingual machine translation A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky, S Goyal, M Baines, ... Journal of Machine Learning Research 22 (107), 1-48, 2021 | 671 | 2021 |
ResMLP: Feedforward networks for image classification with data-efficient training H Touvron, P Bojanowski, M Caron, M Cord, A El-Nouby, E Grave, ... arXiv preprint arXiv:2105.03404, 2021 | 648* | 2021 |
Colorless green recurrent networks dream hierarchically K Gulordava, P Bojanowski, E Grave, T Linzen, M Baroni arXiv preprint arXiv:1803.11138, 2018 | 592 | 2018 |
Reducing transformer depth on demand with structured dropout A Fan, E Grave, A Joulin arXiv preprint arXiv:1909.11556, 2019 | 572 | 2019 |
CCNet: Extracting high quality monolingual datasets from web crawl data G Wenzek, MA Lachaux, A Conneau, V Chaudhary, F Guzmán, A Joulin, ... arXiv preprint arXiv:1911.00359, 2019 | 516 | 2019 |
Towards unsupervised dense information retrieval with contrastive learning G Izacard, M Caron, L Hosseini, S Riedel, P Bojanowski, A Joulin, ... arXiv preprint arXiv:2112.09118 2 (3), 2021 | 463 | 2021 |
Atlas: Few-shot learning with retrieval augmented language models G Izacard, P Lewis, M Lomeli, L Hosseini, F Petroni, T Schick, ... Journal of Machine Learning Research 24 (251), 1-43, 2023 | 405 | 2023 |
Loss in translation: Learning bilingual word mapping with a retrieval criterion A Joulin, P Bojanowski, T Mikolov, H Jégou, E Grave arXiv preprint arXiv:1804.07745, 2018 | 355 | 2018 |
Augmented language models: a survey G Mialon, R Dessì, M Lomeli, C Nalmpantis, R Pasunuru, R Raileanu, ... arXiv preprint arXiv:2302.07842, 2023 | 342 | 2023 |
Improving neural language models with a continuous cache E Grave, A Joulin, N Usunier arXiv preprint arXiv:1612.04426, 2016 | 335 | 2016 |
Efficient softmax approximation for gpus E Grave, A Joulin, M Cissé, D Grangier, H Jégou International Conference on Machine Learning, 1302-1310, 2017 | 311* | 2017 |