Token-level ensemble distillation for grapheme-to-phoneme conversion

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

被引用次数：197 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：454 相关文章所有 2 个版本

[PDF] arxiv.org

Naturalspeech: End-to-end text-to-speech synthesis with human-level quality

X Tan, J Chen, H Liu, J Cong, C Zhang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …

被引用次数：206 相关文章所有 9 个版本

[PDF] arxiv.org

Fastspeech 2: Fast and high-quality end-to-end text to speech

Y Ren, C Hu, X Tan, T Qin, S Zhao, Z Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org

Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize
speech significantly faster than previous autoregressive models with comparable quality …

被引用次数：1574 相关文章所有 3 个版本

[PDF] arxiv.org

Prodiff: Progressive fast diffusion model for high-quality text-to-speech

R Huang, Z Zhao, H Liu, J Liu, C Cui… - Proceedings of the 30th …, 2022 - dl.acm.org

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading
performances in many generative tasks. However, the inherited iterative sampling process …

被引用次数：174 相关文章所有 3 个版本

[PDF] neurips.cc

Fastspeech: Fast, robust and controllable text to speech

Y Ren, Y Ruan, X Tan, T Qin, S Zhao… - Advances in neural …, 2019 - proceedings.neurips.cc

Neural network based end-to-end text to speech (TTS) has significantly improved the quality
of synthesized speech. Prominent methods (eg, Tacotron 2) usually first generate mel …

被引用次数：1248 相关文章所有 10 个版本

[PDF] arxiv.org

Adaspeech: Adaptive text to speech for custom voice

M Chen, X Tan, B Li, Y Liu, T Qin, S Zhao… - arXiv preprint arXiv …, 2021 - arxiv.org

Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims
to adapt a source TTS model to synthesize personal voice for a target speaker using few …

被引用次数：192 相关文章所有 3 个版本

[PDF] arxiv.org

Prompttts: Controllable text-to-speech with text descriptions

Z Guo, Y Leng, Y Wu, S Zhao… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Using a text description as prompt to guide the generation of text or images (eg, GPT-3 or
DALLE-2) has drawn wide attention recently. Beyond text and image generation, in this …

被引用次数：94 相关文章所有 3 个版本

[PDF] neurips.cc

Portaspeech: Portable and high-quality generative text-to-speech

Y Ren, J Liu, Z Zhao - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Non-autoregressive text-to-speech (NAR-TTS) models such as FastSpeech 2 and Glow-TTS
can synthesize high-quality speech from the given text in parallel. After analyzing two kinds …

被引用次数：85 相关文章所有 6 个版本

[PDF] arxiv.org

Lrspeech: Extremely low-resource speech synthesis and recognition

J Xu, X Tan, Y Ren, T Qin, J Li, S Zhao… - Proceedings of the 26th …, 2020 - dl.acm.org

Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR)
are important speech tasks, and require a large amount of text and speech pairs for model …

被引用次数：101 相关文章所有 4 个版本