A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Adaspeech 4: Adaptive text to speech in zero-shot scenarios
Adaptive text to speech (TTS) can synthesize new voices in zero-shot scenarios efficiently,
by using a well-trained source TTS model without adapting it on the speech data of new …
by using a well-trained source TTS model without adapting it on the speech data of new …
A vector quantized approach for text to speech synthesis on real-world spontaneous speech
Abstract Recent Text-to-Speech (TTS) systems trained on reading or acted corpora have
achieved near human-level naturalness. The diversity of human speech, however, often …
achieved near human-level naturalness. The diversity of human speech, however, often …
Navigating the Soundscape of Deception: A Comprehensive Survey on Audio Deepfake Generation, Detection, and Future Horizons
The rise of audio deepfakes presents a significant security threat that undermines trust in
digital communications and media. These synthetic audio technologies can convincingly …
digital communications and media. These synthetic audio technologies can convincingly …
Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis
Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any
adaptation time and parameters. Previous researches usually use a speaker encoder to …
adaptation time and parameters. Previous researches usually use a speaker encoder to …
Dailytalk: Spoken dialogue dataset for conversational text-to-speech
The majority of current Text-to-Speech (TTS) datasets, which are collections of individual
utterances, contain few conversational aspects. In this paper, we introduce DailyTalk, a high …
utterances, contain few conversational aspects. In this paper, we introduce DailyTalk, a high …
Retrievertts: Modeling decomposed factors for text-based speech insertion
This paper proposes a new" decompose-and-edit" paradigm for the text-based speech
insertion task that facilitates arbitrary-length speech insertion and even full sentence …
insertion task that facilitates arbitrary-length speech insertion and even full sentence …
Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models
Spontaneous style speech synthesis, which aims to generate human-like speech, often
encounters challenges due to the scarcity of high-quality data and limitations in model …
encounters challenges due to the scarcity of high-quality data and limitations in model …
[图书][B] Neural text-to-speech synthesis
X Tan - 2023 - Springer
Speaking is one of the most important language capabilities (the others being listening,
reading, and writing) of human beings. Text-to-speech synthesis (TTS for short), which aims …
reading, and writing) of human beings. Text-to-speech synthesis (TTS for short), which aims …
Towards spontaneous style modeling with semi-supervised pre-training for conversational text-to-speech synthesis
The spontaneous behavior that often occurs in conversations makes speech more human-
like compared to reading-style. However, synthesizing spontaneous-style speech is …
like compared to reading-style. However, synthesizing spontaneous-style speech is …