Fine-tuning pre-trained models for Automatic Speech Recognition, experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)

S Guillaume, G Wisniewski, C Macaire… - Proceedings of the …, 2022 - aclanthology.org
This is a report on results obtained in the development of speech recognition tools intended
to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of …

[PDF][PDF] Dialect text normalization to normative standard Finnish

N Partanen, M Hämäläinen… - Workshop on Noisy …, 2019 - researchportal.helsinki.fi
We compare different LSTMs and transformer models in terms of their effectiveness in
normalizing dialectal Finnish into the normative standard Finnish. As dialect is the common …

Dependency parsing of code-switching data with cross-lingual feature representations

N Partanen, KT Lim, M Rießler, T Poibeau - International Workshop on …, 2018 - hal.science
This paper describes the test of a dependency parsing method which is based on
bidirectional LSTM feature representations and multilingual word embedding, and evaluates …

[PDF][PDF] Multilingual dependency parsing for low-resource languages: Case studies on north saami and komi-zyrian

KT Lim, N Partanen, T Poibeau - Proceedings of the Eleventh …, 2018 - aclanthology.org
The paper presents a method for parsing low-resource languages with very small training
corpora using multilingual word embeddings and annotated corpora of larger languages …

[PDF][PDF] Using computational approaches to integrate endangered language legacy data into documentation corpora: Past experiences and challenges ahead

R Blokland, N Partanen, M Rießler… - … , Honolulu, Hawai'i …, 2019 - computel-workshop.org
The systematic integration of pre-digital published transcriptions of legacy language
materials offers many possibilities to enrich documentary corpora with data that is often very …

[PDF][PDF] Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region

CV Gerstenberger, N Partanen, M Rießler - 2017 - munin.uit.no
The paper describes work-in-progress by the Izhva Komi language documentation project,
which records new spoken language data, digitizes available recordings and annotate these …

The relevance of the source language in transfer learning for ASR

N Hjortnæs, N Partanen, M Rießler… - Proceedings of the …, 2021 - journals.colorado.edu
This study presents new experiments on Zyrian Komi speech recognition. We use Deep-
Speech to train ASR models from a language documentation corpus that contains both …

[PDF][PDF] SpoCo–a simple and adaptable web interface for dialect corpora

R von Waldenfels, M Woźniak - Journal for language technology and …, 2016 - jlcl.org
We present SpoCo, a simple, yet effective system for the web-based query of dialect corpora
encoded in ELAN that provides users with advanced concordancing functions, as well as the …

[PDF][PDF] Instant annotations–Applying NLP methods to the annotation of spoken language documentation corpora

C Gerstenberger, N Partanen, M Rießler… - Proceedings of the …, 2017 - aclanthology.org
The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi
language documentation projects, all of which use similar data and technical frameworks …

[PDF][PDF] Documenting endangered oral histories of the Arctic: A proposed symbiosis for documentary linguistics and oral history research, illustrated by Saami and Komi …

M Rießler, J Wilbur - Oral history meets linguistics, 2017 - erepo.uef.fi
In this chapter, we argue that documentary linguistics, particularly as we practice it in our
own projects, can provide valuable resources for social science research. Especially in our …