Analyzing the mono-and cross-lingual pretraining dynamics of multilingual language models

T Blevins, H Gonen, L Zettlemoyer - arXiv preprint arXiv:2205.11758, 2022 - arxiv.org
The emergent cross-lingual transfer seen in multilingual pretrained models has sparked
significant interest in studying their behavior. However, because these analyses have …

Discovering language-neutral sub-networks in multilingual language models

N Foroutan, M Banaei, R Lebret, A Bosselut… - arXiv preprint arXiv …, 2022 - arxiv.org
Multilingual pre-trained language models transfer remarkably well on cross-lingual
downstream tasks. However, the extent to which they learn language-neutral …

How do languages influence each other? Studying cross-lingual data sharing during LLM fine-tuning

R Choenni, D Garrette, E Shutova - arXiv preprint arXiv:2305.13286, 2023 - arxiv.org
Multilingual large language models (MLLMs) are jointly trained on data from many different
languages such that representation of individual languages can benefit from other …

Cross-linguistic syntactic difference in multilingual bert: how good is it and how does it affect transfer?

N Xu, T Gui, R Ma, Q Zhang, J Ye, M Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability,
whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The …

Data-driven cross-lingual syntax: An agreement study with massively multilingual models

AG Varda, M Marelli - Computational Linguistics, 2023 - direct.mit.edu
Massively multilingual models such as mBERT and XLM-R are increasingly valued in
Natural Language Processing research and applications, due to their ability to tackle the …

Differential privacy, linguistic fairness, and training data influence: Impossibility and possibility theorems for multilingual language models

P Rust, A Søgaard - International Conference on Machine …, 2023 - proceedings.mlr.press
Abstract Language models such as mBERT, XLM-R, and BLOOM aim to achieve
multilingual generalization or compression to facilitate transfer to a large number of …

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

T Kojima, I Okimura, Y Iwasawa, H Yanaka… - arXiv preprint arXiv …, 2024 - arxiv.org
Current decoder-based pre-trained language models (PLMs) successfully demonstrate
multilingual capabilities. However, it is unclear how these models handle multilingualism …

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?

A Riabi, B Sagot, D Seddah - arXiv preprint arXiv:2110.13658, 2021 - arxiv.org
Recent impressive improvements in NLP, largely based on the success of contextual neural
language models, have been mostly demonstrated on at most a couple dozen high-resource …

BioBERTurk: Exploring Turkish Biomedical Language Model Development Strategies in Low-Resource Setting

H Türkmen, O Dikenelli, C Eraslan, MC Callı… - Journal of Healthcare …, 2023 - Springer
Pretrained language models augmented with in-domain corpora show impressive results in
biomedicine and clinical Natural Language Processing (NLP) tasks in English. However …

Comparing styles across languages

S Havaldar, M Pressimone, E Wong… - arXiv preprint arXiv …, 2023 - arxiv.org
Understanding how styles differ across languages is advantageous for training both humans
and computers to generate culturally appropriate text. We introduce an explanation …