Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems

F He, SHC Chu, O Kjartansson, C Rivera… - Proceedings of the …, 2020 - aclanthology.org
We present free high quality multi-speaker speech corpora for Gujarati, Kannada,
Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of …

A systematic review and analysis of multilingual data strategies in text-to-speech for low-resource languages

P Do, M Coler, J Dijkstra, E Klabbers - Interspeech 2021, 2021 - research.rug.nl
We provide a systematic review of past studies that use multilingual data for text-to-speech
(TTS) of low-resource languages (LRLs). We focus on the strategies used by these studies …

Open-source multi-speaker corpora of the english accents in the british isles

I Demirsahin, O Kjartansson, A Gutkin… - Proceedings of the …, 2020 - aclanthology.org
This paper presents a dataset of transcribed high-quality audio of English sentences
recorded by volunteers speaking with different accents of the British Isles. The dataset is …

[PDF][PDF] Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali.

O Kjartansson, S Sarin, K Pipatsrisawat, M Jansche… - SLTU, 2018 - isca-archive.org
We present speech corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi
Bengali. Each corpus consists of an average of approximately 200k recorded utterances that …

Crowdsourcing Latin American Spanish for low-resource text-to-speech

A Guevara-Rukoz, I Demirsahin, F He… - Proceedings of the …, 2020 - aclanthology.org
In this paper we present a multidialectal corpus approach for building a text-to-speech voice
for a new dialect in a language with existing resources, focusing on various South American …

[PDF][PDF] A Step-by-Step Process for Building TTS Voices Using Open Source Data and Frameworks for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese.

K Sodimana, P De Silva, S Sarin, O Kjartansson… - SLTU, 2018 - isca-archive.org
The availability of language resources is vital for the development of text-to-speech (TTS)
systems. Thus, open source resources are highly beneficial for TTS research communities …

[PDF][PDF] Developing an open-source corpus of yoruba speech

A Gutkin, I Demirsahin, O Kjartansson, CE Rivera… - 2020 - isca-archive.org
This paper introduces an open-source speech dataset for Yoruba–one of the largest low-
resource West African languages spoken by at least 22 million people. Yoruba is one of the …

Open-source high quality speech datasets for Basque, Catalan and Galician

O Kjartansson, A Gutkin, A Butryna… - Proceedings of the …, 2020 - aclanthology.org
This paper introduces new open speech datasets for three of the languages of Spain:
Basque, Catalan and Galician. Catalan is furthermore the official language of the Principality …

Google crowdsourced speech corpora and related open-source resources for low-resource languages and dialects: an overview

A Butryna, SHC Chu, I Demirsahin, A Gutkin… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents an overview of a program designed to address the growing need for
developing freely available speech resources for under-represented languages. At present …

Burmese speech corpus, finite-state text normalization and pronunciation grammars with an application to text-to-speech

YM Oo, T Wattanavekin, C Li, P De Silva… - Proceedings of the …, 2020 - aclanthology.org
This paper introduces an open-source crowd-sourced multi-speaker speech corpus along
with the comprehensive set of finite-state transducer (FST) grammars for performing text …