Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit

A Michaud, O Adams, TA Cohn, G Neubig… - 2018 -
Automatic speech recognition tools have potential for facilitating language documentation,
but in practice these tools remain little-used by linguists for a variety of reasons, such as that …

Automatic speech recognition for supporting endangered language documentation

E Prud'hommeaux, R Jimerson, R Hatcher… - 2021 -
Generating accurate word-level transcripts of recorded speech for language documentation
is difficult and time-consuming, even for skilled speakers of the target language. Automatic …

User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

O Adams, B Galliot, G Wisniewski… - arXiv preprint arXiv …, 2020 -
This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a
web front-end originally designed to provide access to the Kaldi automatic speech …

Utilizing language technology in the documentation of endangered Uralic languages

C Gerstenberger, N Partanen, M Rießler… - … European Journal of …, 2016 -
The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi
language documentation projects, all of which record new spoken language data, digitize …

[PDF][PDF] Multilingual dependency parsing for low-resource languages: Case studies on north saami and komi-zyrian

KT Lim, N Partanen, T Poibeau - Proceedings of the Eleventh …, 2018 -
The paper presents a method for parsing low-resource languages with very small training
corpora using multilingual word embeddings and annotated corpora of larger languages …

[PDF][PDF] Using computational approaches to integrate endangered language legacy data into documentation corpora: Past experiences and challenges ahead

R Blokland, N Partanen, M Rießler… - … , Honolulu, Hawai'i …, 2019 -
The systematic integration of pre-digital published transcriptions of legacy language
materials offers many possibilities to enrich documentary corpora with data that is often very …

Is There Any Hope for Developing Automated Translation Technology for Sign Languages?

T Jantunen, R Rousi, P Rainò, M Turunen… - Multilingual …, 2021 -
This article discusses the prerequisites for the machine translation of sign languages. The
topic is complex, including questions relating to technology, interaction design, linguistics …

[PDF][PDF] Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region

CV Gerstenberger, N Partanen, M Rießler - 2017 -
The paper describes work-in-progress by the Izhva Komi language documentation project,
which records new spoken language data, digitizes available recordings and annotate these …

[PDF][PDF] On editing dictionaries for uralic languages in an online environment

K Alnajjar, M Hämäläinen, J Rueter - Proceedings of the Sixth …, 2020 -
We present an open online infrastructure for editing and visualization of dictionaries of
different Uralic languages (eg Erzya, Moksha, Skolt Sami and Komi-Zyrian). Our …

[PDF][PDF] Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking

TA Pirinen - Proceedings of the Third Workshop on Universal …, 2019 -
Building a treebank from scratch can easily be an elaborate, highly time consuming task,
especially when working with a minority language with moderately complex morphology and …