Adaspeech: Adaptive text to speech for custom voice

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

被引用次数：625 相关文章所有 2 个版本

[PDF] sciencedirect.com

A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：150 相关文章所有 6 个版本

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

被引用次数：490 相关文章所有 3 个版本

[PDF] mlr.press

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

R Huang, J Huang, D Yang, Y Ren… - International …, 2023 - proceedings.mlr.press

Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …

被引用次数：228 相关文章所有 7 个版本

[PDF] arxiv.org

Naturalspeech: End-to-end text-to-speech synthesis with human-level quality

X Tan, J Chen, H Liu, J Cong, C Zhang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …

被引用次数：171 相关文章所有 9 个版本

[PDF] arxiv.org

Prodiff: Progressive fast diffusion model for high-quality text-to-speech

R Huang, Z Zhao, H Liu, J Liu, C Cui… - Proceedings of the 30th …, 2022 - dl.acm.org

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading
performances in many generative tasks. However, the inherited iterative sampling process …

被引用次数：158 相关文章所有 3 个版本

[PDF] mdpi.com

The power of generative ai: A review of requirements, models, input–output formats, evaluation metrics, and challenges

A Bandi, PVSR Adapa, YEVPK Kuchi - Future Internet, 2023 - mdpi.com

Generative artificial intelligence (AI) has emerged as a powerful technology with numerous
applications in various domains. There is a need to identify the requirements and evaluation …

被引用次数：197 相关文章所有 7 个版本

[PDF] springer.com

When large language models meet personalization: Perspectives of challenges and opportunities

J Chen, Z Liu, X Huang, C Wu, Q Liu, G Jiang, Y Pu… - World Wide Web, 2024 - Springer

The advent of large language models marks a revolutionary breakthrough in artificial
intelligence. With the unprecedented scale of training and model parameters, the capability …

被引用次数：68 相关文章所有 2 个版本

[PDF] neurips.cc

Generspeech: Towards style transfer for generalizable out-of-domain text-to-speech

R Huang, Y Ren, J Liu, C Cui… - Advances in Neural …, 2022 - proceedings.neurips.cc

Style transfer for out-of-domain (OOD) speech synthesis aims to generate speech samples
with unseen style (eg, speaker identity, emotion, and prosody) derived from an acoustic …

被引用次数：78 相关文章所有 6 个版本

[PDF] arxiv.org

Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias

Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …

被引用次数：52 相关文章所有 2 个版本