A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

[PDF][PDF] Mix before align: towards zero-shot cross-lingual sentiment analysis via soft-mix and multi-view learning

Z Zhu, X Cheng, D Chen, Z Huang, H Li… - Proc. of …, 2023 - isca-archive.org
Due to the insufficient sentiment corpus in many languages, recent studies have proposed
cross-lingual sentiment analysis to adapt sentiment analysis models from rich-resource …

Cross-lingual multi-speaker speech synthesis with limited bilingual training data

Z Cai, Y Yang, M Li - Computer Speech & Language, 2023 - Elsevier
Modeling voices for multiple speakers and multiple languages with one speech synthesis
system has been a challenge for a long time, especially in low-resource cases. This paper …

[PDF][PDF] The blizzard challenge 2020

X Zhou, ZH Ling, S King - Proc. Joint Workshop for the Blizzard …, 2020 - researchgate.net
Abstract The Blizzard Challenge 2020 is the sixteenth annual Blizzard Challenge. The
challenge this year includes a hub task of synthesizing Mandarin speech and a spoke task …

Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers

X Wang, C Zeng, J Chen… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
It is challenging to build a multi-singer high-fidelity singing voice synthesis system with cross-
lingual ability by only using monolingual singers in the training stage. In this paper, we …

[PDF][PDF] FACTSpeech: Speaking a foreign language pronunciation using only your native characters

HS Yang, JH Kim, YC Ju, IH Kim, BY Kim… - Proc …, 2023 - isca-archive.org
Recent text-to-speech models have been requested to synthesize natural speech from
language-mixed sentences because they are commonly used in real-world applications …

Generative error correction for code-switching speech recognition using large language models

C Chen, Y Hu, CHH Yang, H Liu, SM Siniscalchi… - arXiv preprint arXiv …, 2023 - arxiv.org
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages
within the same sentence. Despite the recent advances in automatic speech recognition …

Using ipa-based tacotron for data efficient cross-lingual speaker adaptation and pronunciation enhancement

H Hemati, D Borth - arXiv preprint arXiv:2011.06392, 2020 - arxiv.org
Recent neural Text-to-Speech (TTS) models have been shown to perform very well when
enough data is available. However, fine-tuning them for new speakers or languages is not …

Cross-lingual multispeaker text-to-speech under limited-data scenario

Z Cai, Y Yang, M Li - arXiv preprint arXiv:2005.10441, 2020 - arxiv.org
Modeling voices for multiple speakers and multiple languages in one text-to-speech system
has been a challenge for a long time. This paper presents an extension on Tacotron2 to …

Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems

E Eren, C Demiroglu - Computer Speech & Language, 2023 - Elsevier
Abstract End-to-end (e2e) speech synthesis systems have become popular with the recent
introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder …