Language documentation meets language technology

Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit

A Michaud, O Adams, TA Cohn, G Neubig… - 2018 - scholarspace.manoa.hawaii.edu

Automatic speech recognition tools have potential for facilitating language documentation,
but in practice these tools remain little-used by linguists for a variety of reasons, such as that …

被引用次数：70 相关文章所有 11 个版本

[PDF] hawaii.edu

Automatic speech recognition for supporting endangered language documentation

E Prud'hommeaux, R Jimerson, R Hatcher… - 2021 - scholarspace.manoa.hawaii.edu

Generating accurate word-level transcripts of recorded speech for language documentation
is difficult and time-consuming, even for skilled speakers of the target language. Automatic …

被引用次数：32 相关文章所有 5 个版本

[PDF] arxiv.org

User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

O Adams, B Galliot, G Wisniewski… - arXiv preprint arXiv …, 2020 - arxiv.org

This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a
web front-end originally designed to provide access to the Kaldi automatic speech …

被引用次数：19 相关文章所有 28 个版本

[PDF] liu.se

Utilizing language technology in the documentation of endangered Uralic languages

C Gerstenberger, N Partanen, M Rießler… - … European Journal of …, 2016 - nejlt.ep.liu.se

The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi
language documentation projects, all of which record new spoken language data, digitize …

被引用次数：29 相关文章所有 9 个版本

[PDF] aclanthology.org

[PDF][PDF] Multilingual dependency parsing for low-resource languages: Case studies on north saami and komi-zyrian

KT Lim, N Partanen, T Poibeau - Proceedings of the Eleventh …, 2018 - aclanthology.org

The paper presents a method for parsing low-resource languages with very small training
corpora using multilingual word embeddings and annotated corpora of larger languages …

被引用次数：21 相关文章所有 5 个版本

[PDF] computel-workshop.org

[PDF][PDF] Using computational approaches to integrate endangered language legacy data into documentation corpora: Past experiences and challenges ahead

R Blokland, N Partanen, M Rießler… - … , Honolulu, Hawai'i …, 2019 - computel-workshop.org

The systematic integration of pre-digital published transcriptions of legacy language
materials offers many possibilities to enrich documentary corpora with data that is often very …

被引用次数：17 相关文章所有 6 个版本

[PDF] jyu.fi

Is There Any Hope for Developing Automated Translation Technology for Sign Languages?

T Jantunen, R Rousi, P Rainò, M Turunen… - Multilingual …, 2021 - books.google.com

This article discusses the prerequisites for the machine translation of sign languages. The
topic is complex, including questions relating to technology, interaction design, linguistics …

被引用次数：12 相关文章所有 9 个版本

[PDF] uit.no

[PDF][PDF] Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region

CV Gerstenberger, N Partanen, M Rießler - 2017 - munin.uit.no

The paper describes work-in-progress by the Izhva Komi language documentation project,
which records new spoken language data, digitizes available recordings and annotate these …

被引用次数：19 相关文章所有 9 个版本

[PDF] aclanthology.org

[PDF][PDF] On editing dictionaries for uralic languages in an online environment

K Alnajjar, M Hämäläinen, J Rueter - Proceedings of the Sixth …, 2020 - aclanthology.org

We present an open online infrastructure for editing and visualization of dictionaries of
different Uralic languages (eg Erzya, Moksha, Skolt Sami and Komi-Zyrian). Our …

被引用次数：11 相关文章所有 12 个版本

[PDF] aclanthology.org

[PDF][PDF] Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking

TA Pirinen - Proceedings of the Third Workshop on Universal …, 2019 - aclanthology.org

Building a treebank from scratch can easily be an elaborate, highly time consuming task,
especially when working with a minority language with moderately complex morphology and …

被引用次数：12 相关文章所有 3 个版本