Pedro Ortiz Suarez 个人学术档案

引用次数

	总计	2019 年至今
引用	4010	4002
h 指数	12	12
i10 指数	15	15

1600

800

400

1200

20202021202220232024233 398 648 1561 1149

开放获取的出版物数量

查看全部

10 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Benoît SagotDirecteur de recherches at Inria, head of the ALMAnaCH team在 inria.fr 的电子邮件经过验证
Laurent RomaryInria在 inria.fr 的电子邮件经过验证
Yoann DupontMaître de conférences, Sorbonne Nouvelle在 sorbonne-nouvelle.fr 的电子邮件经过验证
Benjamin MullerResearcher at Meta在 meta.com 的电子邮件经过验证
Louis MartinFacebook A.I. Research / Inria在 fb.com 的电子邮件经过验证
Eric Villemonte De la ClergerieINRIA在 inria.fr 的电子邮件经过验证
Djamé SeddahInria (Almanach) & Université Paris Sorbonne (Paris 4)在 paris-sorbonne.fr 的电子邮件经过验证
Julien AbadjiResearch Engineer, Inria在 inria.fr 的电子邮件经过验证
Simon GabayUniversity of Geneva在 unige.ch 的电子邮件经过验证
Rachel BawdenInria在 inria.fr 的电子邮件经过验证
Philippe GambetteAssociate Professor of Computer Science, Université Gustave Eiffel在 u-pem.fr 的电子邮件经过验证
Matthieu FuteralPhD student, Inria Paris在 inria.fr 的电子邮件经过验证
Alix ChaguéPhD student at Inria and Université de Montréal在 inria.fr 的电子邮件经过验证
Luca FoppianoNational Institute for Materials Science在 nims.go.jp 的电子邮件经过验证
Yoshihiko TakanoNational Institute for Materials Science (NIMS)在 nims.go.jp 的电子邮件经过验证
Colin LeongUniversity of Dayton在 udayton.edu 的电子邮件经过验证
Daniel van StrienHugging Face在 huggingface.co 的电子邮件经过验证
Angelina McMillan-MajorUniversity of Washington在 uw.edu 的电子邮件经过验证
Yacine JerniteResearch Scientist, HuggingFace在 cs.nyu.edu 的电子邮件经过验证
Stella BidermanBooz Allen Hamilton, EleutherAI在 bah.com 的电子邮件经过验证

关注

Pedro Ortiz Suarez

其他姓名Pedro Javier Ortiz Suárez

Senior Research Scientist, Common Crawl Foundation

在 commoncrawl.org 的电子邮件经过验证 - 首页

Language modeling Corpus linguistics Named Entity Recognition Computational Linguistics Machine


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...	1416	2023
CamemBERT: a Tasty French Language Model L Martin, B Muller, PJ Ortiz Suárez, Y Dupont, L Romary, ... Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020	1176	2020
Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures PJ Ortiz Suárez, B Sagot, L Romary 7th Workshop on the Challenges in the Management of Large Corpora, 2019	445*	2019
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... Transactions of the Association for Computational Linguistics 10, 50-72, 2022	237*	2022
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages PJ Ortiz Suárez, L Romary, B Sagot Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020	222*	2020
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. arXiv eprints, page J Abadji, P Ortiz Suarez, L Romary, B Sagot arXiv preprint arXiv:2201.06642, 2022	145	2022
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022	140	2022
Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus J Abadji, PJO Suárez, L Romary, B Sagot CMLC 2021-9th Workshop on Challenges in the Management of Large Corpora, 2021	57	2021
Building a user-generated content north-african arabizi treebank: Tackling hell D Seddah, F Essaidi, A Fethi, M Futeral, B Muller, PJ Ortiz Suárez, ... Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020	48	2020
Establishing a New State-of-the-Art for French Named Entity Recognition PJ Ortiz Suárez, Y Dupont, B Muller, L Romary, B Sagot Proceedings of The 12th Language Resources and Evaluation Conference, 4631–4638, 2020	25*	2020
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French S Gabay, P Ortiz Suarez, A Bartz, A Chagué, R Bawden, P Gambette, ... arXiv preprint arXiv:2202.09452, 2022	16	2022
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources A McMillan-Major, Z Alyafeai, S Biderman, K Chen, F De Toni, G Dupont, ... arXiv preprint arXiv:2201.10066, 2022	14	2022
Automatic extraction of materials and properties from superconductors scientific literature L Foppiano, PB Castro, P Ortiz Suarez, K Terashima, Y Takano, M Ishii Science and Technology of Advanced Materials: Methods 3 (1), 2153633, 2023	12	2023
Perplexed by quality: A perplexity-based method for adult and harmful content detection in multilingual heterogeneous web data T Jansen, Y Tong, V Zevallos, PO Suarez arXiv preprint arXiv:2212.10440, 2022	11	2022
Les modèles de langue contextuels Camembert pour le français: impact de la taille et de l'hétérogénéité des données d'entrainement L Martin, B Muller, PJ Ortiz Suárez, Y Dupont, L Romary, E Clergerie, ... Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP …, 2020	11	2020
Bertrade: Using contextual embeddings to parse old french L Grobol, M Regnault, PO Suarez, B Sagot, L Romary, B Crabbé 13th Language Resources and Evaluation Conference, 2022	8	2022
Tokenizer Choice For LLM Training: Negligible or Crucial? M Ali, M Fromm, K Thellmann, R Rutmann, M Lübbering, J Leveling, ... arXiv preprint arXiv:2310.08754, 2023	7	2023
SinNer@CLEF-HIPE2020: Sinful Adaptation of SotA models for Named Entity Recognition in Historical French and German Newspapers PJ Ortiz Suárez, Y Dupont, G Lejeune, T Tian CLEF 2020 Working Notes 2696, 2020	7*	2020
French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus M Popa-Fabre, PJ Ortiz Suárez, B Sagot, ÉV de la Clergerie Proceedings of the 8th Workshop on Challenges in the Management of Large …, 2020	3	2020
How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures M Khemakhem, I Galleron, G Williams, L Romary, PJ Ortiz Suárez	3	2019

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用