Adaspeech 3: Adaptive text to speech for spontaneous style

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：445 相关文章所有 2 个版本

[PDF] arxiv.org

Adaspeech 4: Adaptive text to speech in zero-shot scenarios

Y Wu, X Tan, B Li, L He, S Zhao, R Song, T Qin… - arXiv preprint arXiv …, 2022 - arxiv.org

Adaptive text to speech (TTS) can synthesize new voices in zero-shot scenarios efficiently,
by using a well-trained source TTS model without adapting it on the speech data of new …

被引用次数：67 相关文章所有 6 个版本

[PDF] aaai.org

A vector quantized approach for text to speech synthesis on real-world spontaneous speech

LW Chen, S Watanabe, A Rudnicky - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Abstract Recent Text-to-Speech (TTS) systems trained on reading or acted corpora have
achieved near human-level naturalness. The diversity of human speech, however, often …

被引用次数：40 相关文章所有 6 个版本

[PDF] nowpublishers.com

Navigating the Soundscape of Deception: A Comprehensive Survey on Audio Deepfake Generation, Detection, and Future Horizons

TM Wani, SAA Qadri, FA Wani… - Foundations and Trends …, 2024 - nowpublishers.com

The rise of audio deepfakes presents a significant security threat that undermines trust in
digital communications and media. These synthetic audio technologies can convincingly …

Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis

Y Zhou, C Song, X Li, L Zhang, Z Wu, Y Bian… - arXiv preprint arXiv …, 2022 - arxiv.org

Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any
adaptation time and parameters. Previous researches usually use a speaker encoder to …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

Dailytalk: Spoken dialogue dataset for conversational text-to-speech

K Lee, K Park, D Kim - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

The majority of current Text-to-Speech (TTS) datasets, which are collections of individual
utterances, contain few conversational aspects. In this paper, we introduce DailyTalk, a high …

被引用次数：32 相关文章所有 4 个版本

[PDF] arxiv.org

Retrievertts: Modeling decomposed factors for text-based speech insertion

D Yin, C Tang, Y Liu, X Wang, Z Zhao, Y Zhao… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper proposes a new" decompose-and-edit" paradigm for the text-based speech
insertion task that facilitates arbitrary-length speech insertion and even full sentence …

被引用次数：14 相关文章所有 8 个版本

[PDF] arxiv.org

Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models

W Li, P Yang, Y Zhong, Y Zhou, Z Wang, Z Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Spontaneous style speech synthesis, which aims to generate human-like speech, often
encounters challenges due to the scarcity of high-quality data and limitations in model …

被引用次数：4 相关文章所有 4 个版本

[图书][B] Neural text-to-speech synthesis

X Tan - 2023 - Springer

Speaking is one of the most important language capabilities (the others being listening,
reading, and writing) of human beings. Text-to-speech synthesis (TTS for short), which aims …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Towards spontaneous style modeling with semi-supervised pre-training for conversational text-to-speech synthesis

W Li, S Lei, Q Huang, Y Zhou, Z Wu, S Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

The spontaneous behavior that often occurs in conversations makes speech more human-
like compared to reading-style. However, synthesizing spontaneous-style speech is …

被引用次数：5 相关文章所有 4 个版本