Continual lifelong learning in natural language processing: A survey

M Biesialska, K Biesialska, MR Costa-Jussa - arXiv preprint arXiv …, 2020 - arxiv.org
Continual learning (CL) aims to enable information systems to learn from a continuous data
stream across time. However, it is difficult for existing deep learning architectures to learn a …

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

Crosslingual generalization through multitask finetuning

N Muennighoff, T Wang, L Sutawika, A Roberts… - arXiv preprint arXiv …, 2022 - arxiv.org
Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …

[PDF][PDF] mt5: A massively multilingual pre-trained text-to-text transformer

L Xue - arXiv preprint arXiv:2010.11934, 2020 - fq.pkwyx.com
The recent" Text-to-Text Transfer Transformer"(T5) leveraged a unified text-to-text format and
scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this …

Merging models with fisher-weighted averaging

MS Matena, CA Raffel - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Averaging the parameters of models that have the same architecture and initialization can
provide a means of combining their respective capabilities. In this paper, we take the …

[图书][B] Pretrained transformers for text ranking: Bert and beyond

J Lin, R Nogueira, A Yates - 2022 - books.google.com
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in
response to a query. Although the most common formulation of text ranking is search …

Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

JH Clark, D Garrette, I Turc, J Wieting - Transactions of the Association …, 2022 - direct.mit.edu
Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet
nearly all commonly used models still require an explicit tokenization step. While recent …

Neural unsupervised domain adaptation in NLP---a survey

A Ramponi, B Plank - arXiv preprint arXiv:2006.00632, 2020 - arxiv.org
Deep neural networks excel at learning from labeled data and achieve state-of-the-art
resultson a wide array of Natural Language Processing tasks. In contrast, learning from …

XTREME-R: Towards more challenging and nuanced multilingual evaluation

S Ruder, N Constant, J Botha, A Siddhant… - arXiv preprint arXiv …, 2021 - arxiv.org
Machine learning has brought striking advances in multilingual natural language processing
capabilities over the past year. For example, the latest techniques have improved the state …

Rethinking embedding coupling in pre-trained language models

HW Chung, T Fevry, H Tsai, M Johnson… - arXiv preprint arXiv …, 2020 - arxiv.org
We re-evaluate the standard practice of sharing weights between input and output
embeddings in state-of-the-art pre-trained language models. We show that decoupled …