Byt5: Towards a token-free future with pre-trained byte-to-byte models
Most widely used pre-trained language models operate on sequences of tokens
corresponding to word or subword units. By comparison, token-free models that operate …
corresponding to word or subword units. By comparison, token-free models that operate …
[HTML][HTML] Language varieties of Italy: Technology challenges and opportunities
A Ramponi - Transactions of the Association for Computational …, 2024 - direct.mit.edu
Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
Systematic Inequalities in Language Technology Performance across the World's Languages
Natural language processing (NLP) systems have become a central technology in
communication, education, medicine, artificial intelligence, and many other domains of …
communication, education, medicine, artificial intelligence, and many other domains of …
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
UniMorph 4.0: universal morphology
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-
coverage instantiated normalized morphological inflection tables for hundreds of diverse …
coverage instantiated normalized morphological inflection tables for hundreds of diverse …
Findings of the WMT shared task on machine translation using terminologies
MMI Alam, I Kvapilíková… - Proceedings of the …, 2021 - aclanthology.org
Abstract Language domains that require very careful use of terminology are abundant and
reflect a significant part of the translation industry. In this work we introduce a benchmark for …
reflect a significant part of the translation industry. In this work we introduce a benchmark for …
IGT2P: From interlinear glossed texts to paradigms
An intermediate step in the linguistic analysis of an under-documented language is to find
and organize inflected forms that are attested in natural speech. From this data, linguists …
and organize inflected forms that are attested in natural speech. From this data, linguists …
Can a transformer pass the wug test? Tuning copying bias in neural morphological inflection models
L Liu, M Hulden - arXiv preprint arXiv:2104.06483, 2021 - arxiv.org
Deep learning sequence models have been successfully applied to the task of
morphological inflection. The results of the SIGMORPHON shared tasks in the past several …
morphological inflection. The results of the SIGMORPHON shared tasks in the past several …
Morphological inflection: A reality check
Morphological inflection is a popular task in sub-word NLP with both practical and cognitive
applications. For years now, state-of-the-art systems have reported high, but also highly …
applications. For years now, state-of-the-art systems have reported high, but also highly …
Ensemble self-training for low-resource languages: Grapheme-to-phoneme conversion and morphological inflection
We present an iterative data augmentation framework, which trains and searches for an
optimal ensemble and simultaneously annotates new training data in a self-training style …
optimal ensemble and simultaneously annotates new training data in a self-training style …