English intermediate-task training improves zero-shot cross-lingual transfer too

M Biesialska, K Biesialska, MR Costa-Jussa - arXiv preprint arXiv …, 2020 - arxiv.org

Continual learning (CL) aims to enable information systems to learn from a continuous data
stream across time. However, it is difficult for existing deep learning architectures to learn a …

被引用次数：231 相关文章所有 4 个版本

[PDF] aclanthology.org

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

被引用次数：920 相关文章所有 7 个版本

[PDF] arxiv.org

Crosslingual generalization through multitask finetuning

N Muennighoff, T Wang, L Sutawika, A Roberts… - arXiv preprint arXiv …, 2022 - arxiv.org

Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …

被引用次数：548 相关文章所有 5 个版本

[PDF] pkwyx.com

[PDF][PDF] mt5: A massively multilingual pre-trained text-to-text transformer

L Xue - arXiv preprint arXiv:2010.11934, 2020 - fq.pkwyx.com

The recent" Text-to-Text Transfer Transformer"(T5) leveraged a unified text-to-text format and
scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this …

被引用次数：2164 相关文章

[PDF] neurips.cc

Merging models with fisher-weighted averaging

MS Matena, CA Raffel - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Averaging the parameters of models that have the same architecture and initialization can
provide a means of combining their respective capabilities. In this paper, we take the …

被引用次数：200 相关文章所有 6 个版本

[PDF] mpg.de

[图书][B] Pretrained transformers for text ranking: Bert and beyond

J Lin, R Nogueira, A Yates - 2022 - books.google.com

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in
response to a query. Although the most common formulation of text ranking is search …

被引用次数：484 相关文章所有 11 个版本

[PDF] mit.edu

Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

JH Clark, D Garrette, I Turc, J Wieting - Transactions of the Association …, 2022 - direct.mit.edu

Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet
nearly all commonly used models still require an explicit tokenization step. While recent …

被引用次数：196 相关文章所有 9 个版本

[PDF] arxiv.org

Neural unsupervised domain adaptation in NLP---a survey

A Ramponi, B Plank - arXiv preprint arXiv:2006.00632, 2020 - arxiv.org

Deep neural networks excel at learning from labeled data and achieve state-of-the-art
resultson a wide array of Natural Language Processing tasks. In contrast, learning from …

被引用次数：313 相关文章所有 7 个版本

[PDF] arxiv.org

XTREME-R: Towards more challenging and nuanced multilingual evaluation

S Ruder, N Constant, J Botha, A Siddhant… - arXiv preprint arXiv …, 2021 - arxiv.org

Machine learning has brought striking advances in multilingual natural language processing
capabilities over the past year. For example, the latest techniques have improved the state …

被引用次数：139 相关文章所有 8 个版本

[PDF] arxiv.org

Rethinking embedding coupling in pre-trained language models

HW Chung, T Fevry, H Tsai, M Johnson… - arXiv preprint arXiv …, 2020 - arxiv.org

We re-evaluate the standard practice of sharing weights between input and output
embeddings in state-of-the-art pre-trained language models. We show that decoupled …

被引用次数：131 相关文章所有 4 个版本