Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining

K Nowakowski, M Ptaszynski, K Murasaki… - Information Processing …, 2023 - Elsevier
In recent years, neural models learned through self-supervised pretraining on large scale
multilingual text or speech data have exhibited promising results for underresourced …

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolox\'ochitl Mixtec

J Shi, JD Amith, RC García, EG Sierra, K Duh… - arXiv preprint arXiv …, 2021 - arxiv.org
" Transcription bottlenecks", created by a shortage of effective human transcribers are one of
the main challenges to endangered language (EL) documentation. Automatic speech …

Development of automatic speech recognition for the documentation of Cook Islands Māori

R Coto-Solano, SA Nicholas, S Datta… - Proceedings of the …, 2022 - aclanthology.org
This paper describes the process of data processing and training of an automatic speech
recognition (ASR) system for Cook Islands Māori (CIM), an Indigenous language spoken by …

User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

O Adams, B Galliot, G Wisniewski… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a
web front-end originally designed to provide access to the Kaldi automatic speech …

[PDF][PDF] Speech-to-text recognition for multilingual spoken data in language documentation

LM Rodríguez, C Cox - Proceedings of the sixth workshop on the …, 2023 - aclanthology.org
More than 85% of the languages spoken in Canada are deemed as vulnerable (Lewis,
2009). Efforts for language revitalization and maintenance are essential to maintain both the …

Speech recognition for endangered and extinct Samoyedic languages

N Partanen, M Hämäläinen, T Klooster - arXiv preprint arXiv:2012.05331, 2020 - arxiv.org
Our study presents a series of experiments on speech recognition with endangered and
extinct Samoyedic languages, spoken in Northern and Southern Siberia. To best of our …

Explicit tone transcription improves ASR performance in extremely low-resource languages: A case study in Bribri

R Coto-Solano - Proceedings of the first workshop on natural …, 2021 - aclanthology.org
Linguistic tone is transcribed for input into ASR systems in numerous ways. This paper
shows a systematic test of several transcription styles, using as an example the Chibchan …

When word embeddings become endangered

K Alnajjar - arXiv preprint arXiv:2103.13275, 2021 - arxiv.org
Big languages such as English and Finnish have many natural language processing (NLP)
resources and models, but this is not the case for low-resourced and endangered languages …

End-to-end automatic speech recognition: Its impact on the workflow for documenting yoloxóchitl mixtec

JD Amith, J Shi, R Castillo Garcia - … on NLP for Indigenous Languages of …, 2021 - par.nsf.gov
This paper describes three open access Yoloxóchitl Mixtec corpora and presents the results
and implications of end-to-end automatic speech recognition for endangered language …

[PDF][PDF] Effects of layer freezing on transferring a speech recognition system to under-resourced languages

O Eberhard, T Zesch - Proceedings of the 17th Conference on …, 2021 - aclanthology.org
In this paper, we investigate the effect of layer freezing on the effectiveness of model transfer
in the area of automatic speech recognition. We experiment with Mozilla's Deep-Speech …