Utilizing language technology in the documentation of endangered Uralic languages

Fine-tuning pre-trained models for Automatic Speech Recognition, experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)

S Guillaume, G Wisniewski, C Macaire… - Proceedings of the …, 2022 - aclanthology.org

This is a report on results obtained in the development of speech recognition tools intended
to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of …

被引用次数：28 相关文章所有 12 个版本

[PDF] helsinki.fi

[PDF][PDF] Dialect text normalization to normative standard Finnish

N Partanen, M Hämäläinen… - Workshop on Noisy …, 2019 - researchportal.helsinki.fi

We compare different LSTMs and transformer models in terms of their effectiveness in
normalizing dialectal Finnish into the normative standard Finnish. As dialect is the common …

被引用次数：29 相关文章所有 9 个版本

[PDF] hal.science

Dependency parsing of code-switching data with cross-lingual feature representations

N Partanen, KT Lim, M Rießler, T Poibeau - International Workshop on …, 2018 - hal.science

This paper describes the test of a dependency parsing method which is based on
bidirectional LSTM feature representations and multilingual word embedding, and evaluates …

被引用次数：25 相关文章所有 14 个版本

[PDF] aclanthology.org

[PDF][PDF] Multilingual dependency parsing for low-resource languages: Case studies on north saami and komi-zyrian

KT Lim, N Partanen, T Poibeau - Proceedings of the Eleventh …, 2018 - aclanthology.org

The paper presents a method for parsing low-resource languages with very small training
corpora using multilingual word embeddings and annotated corpora of larger languages …

被引用次数：22 相关文章所有 5 个版本

[PDF] computel-workshop.org

[PDF][PDF] Using computational approaches to integrate endangered language legacy data into documentation corpora: Past experiences and challenges ahead

R Blokland, N Partanen, M Rießler… - … , Honolulu, Hawai'i …, 2019 - computel-workshop.org

The systematic integration of pre-digital published transcriptions of legacy language
materials offers many possibilities to enrich documentary corpora with data that is often very …

被引用次数：17 相关文章所有 6 个版本

[PDF] uit.no

[PDF][PDF] Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region

CV Gerstenberger, N Partanen, M Rießler - 2017 - munin.uit.no

The paper describes work-in-progress by the Izhva Komi language documentation project,
which records new spoken language data, digitizes available recordings and annotate these …

被引用次数：21 相关文章所有 9 个版本

[PDF] colorado.edu

The relevance of the source language in transfer learning for ASR

N Hjortnæs, N Partanen, M Rießler… - Proceedings of the …, 2021 - journals.colorado.edu

This study presents new experiments on Zyrian Komi speech recognition. We use Deep-
Speech to train ASR models from a language documentation corpus that contains both …

被引用次数：10 相关文章所有 8 个版本

[PDF] jlcl.org

[PDF][PDF] SpoCo–a simple and adaptable web interface for dialect corpora

R von Waldenfels, M Woźniak - Journal for language technology and …, 2016 - jlcl.org

We present SpoCo, a simple, yet effective system for the web-based query of dialect corpora
encoded in ELAN that provides users with advanced concordancing functions, as well as the …

被引用次数：12 相关文章所有 8 个版本

[PDF] aclanthology.org

[PDF][PDF] Instant annotations–Applying NLP methods to the annotation of spoken language documentation corpora

C Gerstenberger, N Partanen, M Rießler… - Proceedings of the …, 2017 - aclanthology.org

The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi
language documentation projects, all of which use similar data and technical frameworks …

被引用次数：12 相关文章所有 9 个版本

[PDF] uef.fi

[PDF][PDF] Documenting endangered oral histories of the Arctic: A proposed symbiosis for documentary linguistics and oral history research, illustrated by Saami and Komi …

M Rießler, J Wilbur - Oral history meets linguistics, 2017 - erepo.uef.fi

In this chapter, we argue that documentary linguistics, particularly as we practice it in our
own projects, can provide valuable resources for social science research. Especially in our …

被引用次数：9 相关文章所有 5 个版本