Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems
F He, SHC Chu, O Kjartansson, C Rivera… - Proceedings of the …, 2020 - aclanthology.org
We present free high quality multi-speaker speech corpora for Gujarati, Kannada,
Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of …
Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of …
A systematic review and analysis of multilingual data strategies in text-to-speech for low-resource languages
We provide a systematic review of past studies that use multilingual data for text-to-speech
(TTS) of low-resource languages (LRLs). We focus on the strategies used by these studies …
(TTS) of low-resource languages (LRLs). We focus on the strategies used by these studies …
Open-source multi-speaker corpora of the english accents in the british isles
This paper presents a dataset of transcribed high-quality audio of English sentences
recorded by volunteers speaking with different accents of the British Isles. The dataset is …
recorded by volunteers speaking with different accents of the British Isles. The dataset is …
[PDF][PDF] Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali.
We present speech corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi
Bengali. Each corpus consists of an average of approximately 200k recorded utterances that …
Bengali. Each corpus consists of an average of approximately 200k recorded utterances that …
Crowdsourcing Latin American Spanish for low-resource text-to-speech
A Guevara-Rukoz, I Demirsahin, F He… - Proceedings of the …, 2020 - aclanthology.org
In this paper we present a multidialectal corpus approach for building a text-to-speech voice
for a new dialect in a language with existing resources, focusing on various South American …
for a new dialect in a language with existing resources, focusing on various South American …
[PDF][PDF] A Step-by-Step Process for Building TTS Voices Using Open Source Data and Frameworks for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese.
The availability of language resources is vital for the development of text-to-speech (TTS)
systems. Thus, open source resources are highly beneficial for TTS research communities …
systems. Thus, open source resources are highly beneficial for TTS research communities …
[PDF][PDF] Developing an open-source corpus of yoruba speech
This paper introduces an open-source speech dataset for Yoruba–one of the largest low-
resource West African languages spoken by at least 22 million people. Yoruba is one of the …
resource West African languages spoken by at least 22 million people. Yoruba is one of the …
Open-source high quality speech datasets for Basque, Catalan and Galician
O Kjartansson, A Gutkin, A Butryna… - Proceedings of the …, 2020 - aclanthology.org
This paper introduces new open speech datasets for three of the languages of Spain:
Basque, Catalan and Galician. Catalan is furthermore the official language of the Principality …
Basque, Catalan and Galician. Catalan is furthermore the official language of the Principality …
Google crowdsourced speech corpora and related open-source resources for low-resource languages and dialects: an overview
A Butryna, SHC Chu, I Demirsahin, A Gutkin… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents an overview of a program designed to address the growing need for
developing freely available speech resources for under-represented languages. At present …
developing freely available speech resources for under-represented languages. At present …
Burmese speech corpus, finite-state text normalization and pronunciation grammars with an application to text-to-speech
YM Oo, T Wattanavekin, C Li, P De Silva… - Proceedings of the …, 2020 - aclanthology.org
This paper introduces an open-source crowd-sourced multi-speaker speech corpus along
with the comprehensive set of finite-state transducer (FST) grammars for performing text …
with the comprehensive set of finite-state transducer (FST) grammars for performing text …