Towards a speech recognizer for Komi, an endangered and low-resource Uralic language

Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining

K Nowakowski, M Ptaszynski, K Murasaki… - Information Processing …, 2023 - Elsevier

In recent years, neural models learned through self-supervised pretraining on large scale
multilingual text or speech data have exhibited promising results for underresourced …

被引用次数：29 相关文章所有 7 个版本

[PDF] arxiv.org

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolox\'ochitl Mixtec

J Shi, JD Amith, RC García, EG Sierra, K Duh… - arXiv preprint arXiv …, 2021 - arxiv.org

" Transcription bottlenecks", created by a shortage of effective human transcribers are one of
the main challenges to endangered language (EL) documentation. Automatic speech …

被引用次数：39 相关文章所有 10 个版本

[PDF] aclanthology.org

Development of automatic speech recognition for the documentation of Cook Islands Māori

R Coto-Solano, SA Nicholas, S Datta… - Proceedings of the …, 2022 - aclanthology.org

This paper describes the process of data processing and training of an automatic speech
recognition (ASR) system for Cook Islands Māori (CIM), an Indigenous language spoken by …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

O Adams, B Galliot, G Wisniewski… - arXiv preprint arXiv …, 2020 - arxiv.org

This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a
web front-end originally designed to provide access to the Kaldi automatic speech …

被引用次数：19 相关文章所有 28 个版本

[PDF] aclanthology.org

[PDF][PDF] Speech-to-text recognition for multilingual spoken data in language documentation

LM Rodríguez, C Cox - Proceedings of the sixth workshop on the …, 2023 - aclanthology.org

More than 85% of the languages spoken in Canada are deemed as vulnerable (Lewis,
2009). Efforts for language revitalization and maintenance are essential to maintain both the …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Speech recognition for endangered and extinct Samoyedic languages

N Partanen, M Hämäläinen, T Klooster - arXiv preprint arXiv:2012.05331, 2020 - arxiv.org

Our study presents a series of experiments on speech recognition with endangered and
extinct Samoyedic languages, spoken in Northern and Southern Siberia. To best of our …

被引用次数：13 相关文章所有 10 个版本

[PDF] aclanthology.org

Explicit tone transcription improves ASR performance in extremely low-resource languages: A case study in Bribri

R Coto-Solano - Proceedings of the first workshop on natural …, 2021 - aclanthology.org

Linguistic tone is transcribed for input into ASR systems in numerous ways. This paper
shows a systematic test of several transcription styles, using as an example the Chibchan …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

When word embeddings become endangered

K Alnajjar - arXiv preprint arXiv:2103.13275, 2021 - arxiv.org

Big languages such as English and Finnish have many natural language processing (NLP)
resources and models, but this is not the case for low-resourced and endangered languages …

被引用次数：10 相关文章所有 8 个版本

[PDF] nsf.gov

End-to-end automatic speech recognition: Its impact on the workflow for documenting yoloxóchitl mixtec

JD Amith, J Shi, R Castillo Garcia - … on NLP for Indigenous Languages of …, 2021 - par.nsf.gov

This paper describes three open access Yoloxóchitl Mixtec corpora and presents the results
and implications of end-to-end automatic speech recognition for endangered language …

被引用次数：10 相关文章所有 7 个版本

[PDF] aclanthology.org

[PDF][PDF] Effects of layer freezing on transferring a speech recognition system to under-resourced languages

O Eberhard, T Zesch - Proceedings of the 17th Conference on …, 2021 - aclanthology.org

In this paper, we investigate the effect of layer freezing on the effectiveness of model transfer
in the area of automatic speech recognition. We experiment with Mozilla's Deep-Speech …

被引用次数：11 相关文章所有 5 个版本