DiCLET-TTS: Diffusion model based cross-lingual emotion transfer for text-to-speech—A study between English and Mandarin
While the performance of cross-lingual TTS based on monolingual corpora has been
significantly improved recently, generating cross-lingual speech still suffers from the foreign …
significantly improved recently, generating cross-lingual speech still suffers from the foreign …
Exploring the role of language families for building indic speech synthesisers
Building end-to-end speech synthesisers for Indian languages is challenging, given the lack
of adequate clean training data and multiple grapheme representations across languages …
of adequate clean training data and multiple grapheme representations across languages …
Unify and conquer: How phonetic feature representation affects polyglot text-to-speech (TTS)
An essential design decision for multilingual Neural Text-To-Speech (NTTS) systems is how
to represent input linguistic features within the model. Looking at the wide variety of …
to represent input linguistic features within the model. Looking at the wide variety of …
Mix and match: an empirical study on training corpus composition for polyglot text-to-speech (TTS)
Training multilingual Neural Text-To-Speech (NTTS) models using only monolingual
corpora has emerged as a popular way for building voice cloning based Polyglot NTTS …
corpora has emerged as a popular way for building voice cloning based Polyglot NTTS …
[HTML][HTML] Cross-lingual style transfer with conditional prior VAE and style loss
In this work we improve the style representation for crosslingual style transfer. Specifically,
we improve the Spanish representation across four styles, Newscaster, DJ, Excited, and …
we improve the Spanish representation across four styles, Newscaster, DJ, Excited, and …
Exploring timbre disentanglement in non-autoregressive cross-lingual text-to-speech
In this paper, we study the disentanglement of speaker and language representations in non-
autoregressive cross-lingual TTS models from various aspects. We propose a phoneme …
autoregressive cross-lingual TTS models from various aspects. We propose a phoneme …
[HTML][HTML] Speech generation for indigenous language education
As the quality of contemporary speech synthesis improves, so too does the interest from
language communities in developing text-to-speech (TTS) systems for a variety of real-world …
language communities in developing text-to-speech (TTS) systems for a variety of real-world …
[PDF][PDF] Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis
We introduce continuous phonological features as input to TTS with the dual objective of
more precise control over phonological aspects and better potential for exploration of latent …
more precise control over phonological aspects and better potential for exploration of latent …
Few-shot cross-lingual tts using transferable phoneme embedding
This paper studies a transferable phoneme embedding framework that aims to deal with the
cross-lingual text-to-speech (TTS) problem under the few-shot setting. Transfer learning is a …
cross-lingual text-to-speech (TTS) problem under the few-shot setting. Transfer learning is a …
Self-supervised learning for robust voice cloning
K Klapsas, N Ellinas, K Nikitaras… - arXiv preprint arXiv …, 2022 - arxiv.org
Voice cloning is a difficult task which requires robust and informative features incorporated
in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our …
in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our …