UniMorph 3.0: Universal Morphology

AD McCarthy, C Kirov, M Grella… - … of The 12th …, 2020 - research-collection.ethz.ch
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-
coverage instantiated normalized morphological paradigms for hundreds of diverse world …

The Johns Hopkins University Bible corpus: 1600+ tongues for typological exploration

AD McCarthy, R Wicks, D Lewis, A Mueller… - Proceedings of the …, 2020 - aclanthology.org
We present findings from the creation of a massively parallel corpus in over 1600
languages, the Johns Hopkins University Bible Corpus (JHUBC). The corpus consists of …

Pre-trained multilingual sequence-to-sequence models: A hope for low-resource language translation?

ESA Lee, S Thillainathan, S Nayak… - arXiv preprint arXiv …, 2022 - arxiv.org
What can pre-trained multilingual sequence-to-sequence models like mBART contribute to
translating low-resource languages? We conduct a thorough empirical experiment in 10 …

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

A Wiemerslage, M Silfverberg, C Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
Automatic morphological processing can aid downstream natural language processing
applications, especially for low-resource languages, and assist language documentation …

The SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion

K Kann, A McCarthy, G Nicolai, M Hulden - arXiv preprint arXiv …, 2020 - arxiv.org
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on
unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel …

Meeting the needs of low-resource languages: The value of automatic alignments via pretrained models

A Ebrahimi, AD McCarthy, A Oncevay… - arXiv preprint arXiv …, 2023 - arxiv.org
Large multilingual models have inspired a new class of word alignment methods, which
work well for the model's pretraining languages. However, the languages most in need of …

The SIGMORPHON 2022 Shared Task on Cross-lingual and Low-Resource Grapheme-to-Phoneme Conversion

AD McCarthy, JL Lee, A DeLucia… - Proceedings of the …, 2023 - aclanthology.org
Grapheme-to-phoneme conversion is an important component in many speech
technologies, but until recently there were no multilingual benchmarks for this task. The third …

Joint learning model for low-resource agglutinative language morphological tagging

G Abudouwaili, K Abiderexiti, N Yi… - Proceedings of the 20th …, 2023 - aclanthology.org
Due to the lack of data resources, rule-based or transfer learning is mainly used in the
morphological tagging of low-resource languages. However, these methods require expert …

Codex to corpus: Exploring annotation and processing for an open and extensible machine-readable edition of the Florentine Codex

F Tyers, R Pugh, V Berthoud - Proceedings of the Workshop on …, 2023 - aclanthology.org
This paper describes an ongoing effort to create, from the original hand-written text, a
machine-readable, linguistically-annotated, and easily-searchable corpus of the Nahuatl …

Developing finite-state language technology for maya

R Pugh, F Tyers, Q Castañeda - Proceedings of the Workshop on …, 2023 - aclanthology.org
We describe a suite of finite-state language technologies for Maya, a Mayan language
spoken in Mexico. At the core is a computational model of Maya morphology and phonology …