Voicebox: Text-guided multilingual universal speech generation at scale

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

被引用次数：104 相关文章所有 3 个版本

[PDF] ieee.org

Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers.

S Bengesi, H El-Sayed, MK Sarker, Y Houkpati… - IEEE …, 2024 - ieeexplore.ieee.org

The launch of ChatGPT in 2022 garnered global attention, marking a significant milestone in
the Generative Artificial Intelligence (GAI) field. While GAI has been in effect for the past …

被引用次数：14 相关文章所有 4 个版本

[PDF] acm.org

Generative artificial intelligence in learning analytics: Contextualising opportunities and challenges through the learning analytics cycle

L Yan, R Martinez-Maldonado, D Gasevic - Proceedings of the 14th …, 2024 - dl.acm.org

Generative artificial intelligence (GenAI), exemplified by ChatGPT, Midjourney, and other
state-of-the-art large language models and diffusion models, holds significant potential for …

被引用次数：17 相关文章所有 5 个版本

[PDF] neurips.cc

Understanding diffusion objectives as the elbo with simple data augmentation

D Kingma, R Gao - Advances in Neural Information …, 2024 - proceedings.neurips.cc

To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized
with objectives that typically look very different from the maximum likelihood and the …

被引用次数：33 相关文章所有 6 个版本

[PDF] arxiv.org

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

被引用次数：38 相关文章所有 2 个版本

[PDF] arxiv.org

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

被引用次数：37 相关文章

[PDF] arxiv.org

Prompttts 2: Describing and generating voices with text prompt

Y Leng, Z Guo, K Shen, X Tan, Z Ju, Y Liu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Speech conveys more information than just text, as the same word can be uttered in various
voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D Xin, D Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

Generative pre-training for speech with flow matching

AH Liu, M Le, A Vyas, B Shi, A Tjandra… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative models have gained more and more attention in recent years for their
remarkable success in tasks that required estimating and sampling data distribution to …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Adriver-i: A general world model for autonomous driving

F Jia, W Mao, Y Liu, Y Zhao, Y Wen, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Typically, autonomous driving adopts a modular design, which divides the full stack into
perception, prediction, planning and control parts. Though interpretable, such modular …

被引用次数：15 相关文章所有 2 个版本