Examining modularity in multilingual lms via language-specialized subnetworks

R Choenni, E Shutova, D Garrette - arXiv preprint arXiv:2311.08273, 2023 - arxiv.org
Recent work has proposed explicitly inducing language-wise modularity in multilingual LMs
via sparse fine-tuning (SFT) on per-language subnetworks as a means of better guiding …

Gradient-based intra-attention pruning on pre-trained language models

Z Yang, Y Cui, X Yao, S Wang - arXiv preprint arXiv:2212.07634, 2022 - arxiv.org
Pre-trained language models achieve superior performance but are computationally
expensive. Techniques such as pruning and knowledge distillation have been developed to …

Cross-Lingual Transfer with Language-Specific Subnetworks for Low-Resource Dependency Parsing

R Choenni, D Garrette, E Shutova - Computational Linguistics, 2023 - direct.mit.edu
Large multilingual language models typically share their parameters across all languages,
which enables cross-lingual task transfer, but learning can also be hindered when training …

Multilinguality and Multiculturalism: Towards more Effective and Inclusive Neural Language Models

R Choenni - 2025 - eprints.illc.uva.nl
Large-scale pretraining requires vast amounts of text in a given language, which limits the
applicability of such techniques to a handful of high-resource languages. Therefore …