UniMorph 3.0: Universal Morphology
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-
coverage instantiated normalized morphological paradigms for hundreds of diverse world …
coverage instantiated normalized morphological paradigms for hundreds of diverse world …
The Johns Hopkins University Bible corpus: 1600+ tongues for typological exploration
We present findings from the creation of a massively parallel corpus in over 1600
languages, the Johns Hopkins University Bible Corpus (JHUBC). The corpus consists of …
languages, the Johns Hopkins University Bible Corpus (JHUBC). The corpus consists of …
Pre-trained multilingual sequence-to-sequence models: A hope for low-resource language translation?
ESA Lee, S Thillainathan, S Nayak… - arXiv preprint arXiv …, 2022 - arxiv.org
What can pre-trained multilingual sequence-to-sequence models like mBART contribute to
translating low-resource languages? We conduct a thorough empirical experiment in 10 …
translating low-resource languages? We conduct a thorough empirical experiment in 10 …
Morphological Processing of Low-Resource Languages: Where We Are and What's Next
Automatic morphological processing can aid downstream natural language processing
applications, especially for low-resource languages, and assist language documentation …
applications, especially for low-resource languages, and assist language documentation …
The SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on
unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel …
unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel …
Meeting the needs of low-resource languages: The value of automatic alignments via pretrained models
Large multilingual models have inspired a new class of word alignment methods, which
work well for the model's pretraining languages. However, the languages most in need of …
work well for the model's pretraining languages. However, the languages most in need of …
The SIGMORPHON 2022 Shared Task on Cross-lingual and Low-Resource Grapheme-to-Phoneme Conversion
Grapheme-to-phoneme conversion is an important component in many speech
technologies, but until recently there were no multilingual benchmarks for this task. The third …
technologies, but until recently there were no multilingual benchmarks for this task. The third …
Joint learning model for low-resource agglutinative language morphological tagging
G Abudouwaili, K Abiderexiti, N Yi… - Proceedings of the 20th …, 2023 - aclanthology.org
Due to the lack of data resources, rule-based or transfer learning is mainly used in the
morphological tagging of low-resource languages. However, these methods require expert …
morphological tagging of low-resource languages. However, these methods require expert …
Codex to corpus: Exploring annotation and processing for an open and extensible machine-readable edition of the Florentine Codex
This paper describes an ongoing effort to create, from the original hand-written text, a
machine-readable, linguistically-annotated, and easily-searchable corpus of the Nahuatl …
machine-readable, linguistically-annotated, and easily-searchable corpus of the Nahuatl …
Developing finite-state language technology for maya
We describe a suite of finite-state language technologies for Maya, a Mayan language
spoken in Mexico. At the core is a computational model of Maya morphology and phonology …
spoken in Mexico. At the core is a computational model of Maya morphology and phonology …