Endangered Languages are not Low-Resourced!
M Hämäläinen - arXiv preprint arXiv:2103.09567, 2021 - arxiv.org
The term low-resourced has been tossed around in the field of natural language processing
to a degree that almost any language that is not English can be called" low-resourced"; …
to a degree that almost any language that is not English can be called" low-resourced"; …
When word embeddings become endangered
K Alnajjar - arXiv preprint arXiv:2103.13275, 2021 - arxiv.org
Big languages such as English and Finnish have many natural language processing (NLP)
resources and models, but this is not the case for low-resourced and endangered languages …
resources and models, but this is not the case for low-resourced and endangered languages …
Sentiment analysis using aligned word embeddings for uralic languages
In this paper, we present an approach for translating word embeddings from a majority
language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore …
language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore …
Using graph-based methods to augment online dictionaries of endangered languages
K Alnajjar, M Hämäläinen… - Workshop on the …, 2022 - researchportal.helsinki.fi
Many endangered Uralic languages have multilingual machine readable dictionaries saved
in an XML format. However, the dictionaries cover translations very inconsistently between …
in an XML format. However, the dictionaries cover translations very inconsistently between …
Modelling the Reduplicating Lushootseed Morphology with an FST and LSTM
In this paper, we present an FST based approach for conducting morphological analysis,
lemmatization and generation of Lushootseed words. Furthermore, we use the FST to …
lemmatization and generation of Lushootseed words. Furthermore, we use the FST to …
Prerequisites for shallow-transfer machine translation of Mordvin languages: Language documentation with a purpose
J Rueter, M Hämäläinen - 2021 - preprints.org
This paper presents the current lexical, morphological, syntactic and rule-based machine
translation work for Erzya and Moksha that can and should be used in the development of a …
translation work for Erzya and Moksha that can and should be used in the development of a …
[PDF][PDF] Lexd: A finitestate lexicon compiler for non-suffixational morphologies
D Swanson, N Howell - Multilingual Facilitation, 2021 - pdfs.semanticscholar.org
This paper presents lexd, a lexicon compiler for languages with nonsuffixational
morphology, which is intended to be faster and easier to use than existing solutions while …
morphology, which is intended to be faster and easier to use than existing solutions while …
Working Towards Digital Documentation of Uralic Languages With Open-Source Tools and Modern NLP Methods
We present our work towards building an infrastructure for documenting endangered
languages with the focus on Uralic languages in particular. Our infrastructure consists of …
languages with the focus on Uralic languages in particular. Our infrastructure consists of …
DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in Endangered Uralic Languages using ChatGPT
M Hämäläinen - arXiv preprint arXiv:2411.01531, 2024 - arxiv.org
We showcase that ChatGPT can be used to disambiguate lemmas in two endangered
languages ChatGPT is not proficient in, namely Erzya and Skolt Sami. We augment our …
languages ChatGPT is not proficient in, namely Erzya and Skolt Sami. We augment our …
On Erzya and Moksha Corpora and Analyzer Development, ERME-PSLA 1950s
J Rueter, O Erina, N Kabaeva - Proceedings of the 9th …, 2024 - aclanthology.org
This paper describes materials and annotation facilitation pertinent to the «Erzya-Moksha
Electronic Resources and Linguistic Diversity»(EMERALD) project. It addresses work …
Electronic Resources and Linguistic Diversity»(EMERALD) project. It addresses work …