First align, then predict: Understanding the cross-lingual ability of multilingual BERT

T Blevins, H Gonen, L Zettlemoyer - arXiv preprint arXiv:2205.11758, 2022 - arxiv.org

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked
significant interest in studying their behavior. However, because these analyses have …

被引用次数：20 相关文章所有 5 个版本

[PDF] arxiv.org

Discovering language-neutral sub-networks in multilingual language models

N Foroutan, M Banaei, R Lebret, A Bosselut… - arXiv preprint arXiv …, 2022 - arxiv.org

Multilingual pre-trained language models transfer remarkably well on cross-lingual
downstream tasks. However, the extent to which they learn language-neutral …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

How do languages influence each other? Studying cross-lingual data sharing during LLM fine-tuning

R Choenni, D Garrette, E Shutova - arXiv preprint arXiv:2305.13286, 2023 - arxiv.org

Multilingual large language models (MLLMs) are jointly trained on data from many different
languages such that representation of individual languages can benefit from other …

被引用次数：9 相关文章所有 7 个版本

[PDF] arxiv.org

Cross-linguistic syntactic difference in multilingual bert: how good is it and how does it affect transfer?

N Xu, T Gui, R Ma, Q Zhang, J Ye, M Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org

Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability,
whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The …

被引用次数：8 相关文章所有 7 个版本

[HTML] mit.edu

Data-driven cross-lingual syntax: An agreement study with massively multilingual models

AG Varda, M Marelli - Computational Linguistics, 2023 - direct.mit.edu

Massively multilingual models such as mBERT and XLM-R are increasingly valued in
Natural Language Processing research and applications, due to their ability to tackle the …

被引用次数：8 相关文章所有 5 个版本

[PDF] mlr.press

Differential privacy, linguistic fairness, and training data influence: Impossibility and possibility theorems for multilingual language models

P Rust, A Søgaard - International Conference on Machine …, 2023 - proceedings.mlr.press

Abstract Language models such as mBERT, XLM-R, and BLOOM aim to achieve
multilingual generalization or compression to facilitate transfer to a large number of …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

T Kojima, I Okimura, Y Iwasawa, H Yanaka… - arXiv preprint arXiv …, 2024 - arxiv.org

Current decoder-based pre-trained language models (PLMs) successfully demonstrate
multilingual capabilities. However, it is unclear how these models handle multilingualism …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?

A Riabi, B Sagot, D Seddah - arXiv preprint arXiv:2110.13658, 2021 - arxiv.org

Recent impressive improvements in NLP, largely based on the success of contextual neural
language models, have been mostly demonstrated on at most a couple dozen high-resource …

被引用次数：15 相关文章所有 9 个版本

[PDF] researchsquare.com

BioBERTurk: Exploring Turkish Biomedical Language Model Development Strategies in Low-Resource Setting

H Türkmen, O Dikenelli, C Eraslan, MC Callı… - Journal of Healthcare …, 2023 - Springer

Pretrained language models augmented with in-domain corpora show impressive results in
biomedicine and clinical Natural Language Processing (NLP) tasks in English. However …

被引用次数：10 相关文章所有 6 个版本

[PDF] arxiv.org

Comparing styles across languages

S Havaldar, M Pressimone, E Wong… - arXiv preprint arXiv …, 2023 - arxiv.org

Understanding how styles differ across languages is advantageous for training both humans
and computers to generate culturally appropriate text. We introduce an explanation …

被引用次数：2 相关文章所有 5 个版本