Mural: multimodal, multitask retrieval across languages

A Jain, M Guo, K Srinivasan, T Chen… - arXiv preprint arXiv …, 2021 - arxiv.org
Both image-caption pairs and translation pairs provide the means to learn deep
representations of and connections between languages. We use both types of pairs in …

A study on the challenges and opportunities of speech recognition for Bengali language

MF Mridha, AQ Ohi, MA Hamid… - Artificial Intelligence …, 2022 - Springer
Speech recognition is a fascinating process that offers the opportunity to interact and
command the machine in the field of human-computer interactions. Speech recognition is a …

Mlphon: A multifunctional grapheme-phoneme conversion tool using finite state transducers

K Manohar, AR Jayan, R Rajan - IEEE Access, 2022 - ieeexplore.ieee.org
In this article we present the design and the development of a knowledge based
computational linguistic tool, Mlphon for Malayalam language. Mlphon computationally …

Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings

PS Varadhan, A Sankar, G Raju, MM Khapra - arXiv preprint arXiv …, 2024 - arxiv.org
We release Rasa, the first multilingual expressive TTS dataset for any Indian language,
which contains 10 hours of neutral speech and 1-3 hours of expressive speech for each of …

Strategies in transfer learning for low-resource speech synthesis: Phone mapping, features input, and source language selection

P Do, M Coler, J Dijkstra, E Klabbers - arXiv preprint arXiv:2306.12040, 2023 - arxiv.org
We compare using a PHOIBLE-based phone mapping method and using phonological
features input in transfer learning for TTS in low-resource languages. We use diverse source …

Indicvoices-r: Unlocking a massive multilingual multi-speaker speech corpus for scaling indian TTS

A Sankar, S Anand, PS Varadhan, S Thomas… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in text-to-speech (TTS) synthesis show that large-scale models
trained with extensive web data produce highly natural-sounding output. However, such …

The ldc-il speech corpora

N Choudhary, DG Rao - 2020 23rd Conference of the Oriental …, 2020 - ieeexplore.ieee.org
This paper introduces the first set of speech corpora released in 2019 by the Linguistic Data
Consortium for Indian Languages (LDC-IL), a scheme under the Department of Higher …

SUST TTS Corpus: A phonetically-balanced corpus for Bangla text-to-speech synthesis

A Ahmad, MR Selim, MZ Iqbal… - Acoustical Science and …, 2021 - jstage.jst.go.jp
This paper presents the Shahjalal University of Science and Technology Text-To-Speech
Corpus (SUST TTS Corpus), a phonetically balanced speech corpus for Bangla speech …

Data-efficient training strategies for neural TTS systems

KR Prajwal, CV Jawahar - Proceedings of the 3rd ACM India Joint …, 2021 - dl.acm.org
India is a country with thousands of languages and dialects spoken across a billion-strong
population. For multi-lingual content creation and accessibility, text-to-speech systems will …

Challenges and opportunities of speech recognition for bengali language

MF Mridha, AQ Ohi, MA Hamid… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech recognition is a fascinating process that offers the opportunity to interact and
command the machine in the field of human-computer interactions. Speech recognition is a …