A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Adaspeech 4: Adaptive text to speech in zero-shot scenarios

Y Wu, X Tan, B Li, L He, S Zhao, R Song, T Qin… - arXiv preprint arXiv …, 2022 - arxiv.org
Adaptive text to speech (TTS) can synthesize new voices in zero-shot scenarios efficiently,
by using a well-trained source TTS model without adapting it on the speech data of new …

A vector quantized approach for text to speech synthesis on real-world spontaneous speech

LW Chen, S Watanabe, A Rudnicky - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract Recent Text-to-Speech (TTS) systems trained on reading or acted corpora have
achieved near human-level naturalness. The diversity of human speech, however, often …

Navigating the Soundscape of Deception: A Comprehensive Survey on Audio Deepfake Generation, Detection, and Future Horizons

TM Wani, SAA Qadri, FA Wani… - Foundations and Trends …, 2024 - nowpublishers.com
The rise of audio deepfakes presents a significant security threat that undermines trust in
digital communications and media. These synthetic audio technologies can convincingly …

Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis

Y Zhou, C Song, X Li, L Zhang, Z Wu, Y Bian… - arXiv preprint arXiv …, 2022 - arxiv.org
Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any
adaptation time and parameters. Previous researches usually use a speaker encoder to …

Dailytalk: Spoken dialogue dataset for conversational text-to-speech

K Lee, K Park, D Kim - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
The majority of current Text-to-Speech (TTS) datasets, which are collections of individual
utterances, contain few conversational aspects. In this paper, we introduce DailyTalk, a high …

Retrievertts: Modeling decomposed factors for text-based speech insertion

D Yin, C Tang, Y Liu, X Wang, Z Zhao, Y Zhao… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper proposes a new" decompose-and-edit" paradigm for the text-based speech
insertion task that facilitates arbitrary-length speech insertion and even full sentence …

Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models

W Li, P Yang, Y Zhong, Y Zhou, Z Wang, Z Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Spontaneous style speech synthesis, which aims to generate human-like speech, often
encounters challenges due to the scarcity of high-quality data and limitations in model …

[图书][B] Neural text-to-speech synthesis

X Tan - 2023 - Springer
Speaking is one of the most important language capabilities (the others being listening,
reading, and writing) of human beings. Text-to-speech synthesis (TTS for short), which aims …

Towards spontaneous style modeling with semi-supervised pre-training for conversational text-to-speech synthesis

W Li, S Lei, Q Huang, Y Zhou, Z Wu, S Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
The spontaneous behavior that often occurs in conversations makes speech more human-
like compared to reading-style. However, synthesizing spontaneous-style speech is …