A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
[PDF][PDF] Mix before align: towards zero-shot cross-lingual sentiment analysis via soft-mix and multi-view learning
Due to the insufficient sentiment corpus in many languages, recent studies have proposed
cross-lingual sentiment analysis to adapt sentiment analysis models from rich-resource …
cross-lingual sentiment analysis to adapt sentiment analysis models from rich-resource …
Cross-lingual multi-speaker speech synthesis with limited bilingual training data
Modeling voices for multiple speakers and multiple languages with one speech synthesis
system has been a challenge for a long time, especially in low-resource cases. This paper …
system has been a challenge for a long time, especially in low-resource cases. This paper …
[PDF][PDF] The blizzard challenge 2020
Abstract The Blizzard Challenge 2020 is the sixteenth annual Blizzard Challenge. The
challenge this year includes a hub task of synthesizing Mandarin speech and a spoke task …
challenge this year includes a hub task of synthesizing Mandarin speech and a spoke task …
Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
It is challenging to build a multi-singer high-fidelity singing voice synthesis system with cross-
lingual ability by only using monolingual singers in the training stage. In this paper, we …
lingual ability by only using monolingual singers in the training stage. In this paper, we …
[PDF][PDF] FACTSpeech: Speaking a foreign language pronunciation using only your native characters
Recent text-to-speech models have been requested to synthesize natural speech from
language-mixed sentences because they are commonly used in real-world applications …
language-mixed sentences because they are commonly used in real-world applications …
Generative error correction for code-switching speech recognition using large language models
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages
within the same sentence. Despite the recent advances in automatic speech recognition …
within the same sentence. Despite the recent advances in automatic speech recognition …
Using ipa-based tacotron for data efficient cross-lingual speaker adaptation and pronunciation enhancement
H Hemati, D Borth - arXiv preprint arXiv:2011.06392, 2020 - arxiv.org
Recent neural Text-to-Speech (TTS) models have been shown to perform very well when
enough data is available. However, fine-tuning them for new speakers or languages is not …
enough data is available. However, fine-tuning them for new speakers or languages is not …
Cross-lingual multispeaker text-to-speech under limited-data scenario
Modeling voices for multiple speakers and multiple languages in one text-to-speech system
has been a challenge for a long time. This paper presents an extension on Tacotron2 to …
has been a challenge for a long time. This paper presents an extension on Tacotron2 to …
Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems
E Eren, C Demiroglu - Computer Speech & Language, 2023 - Elsevier
Abstract End-to-end (e2e) speech synthesis systems have become popular with the recent
introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder …
introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder …