A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt
Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …
from society. As a result, many individuals have become interested in related resources and …
A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Neural codec language models are zero-shot text to speech synthesizers
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …
we train a neural codec language model (called Vall-E) using discrete codes derived from …
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models
Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …
text-to-video generation. Its application to audio still lags behind for two main reasons: the …
Naturalspeech: End-to-end text-to-speech synthesis with human-level quality
Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …
years. Some questions naturally arise that whether a TTS system can achieve human-level …
Prodiff: Progressive fast diffusion model for high-quality text-to-speech
Denoising diffusion probabilistic models (DDPMs) have recently achieved leading
performances in many generative tasks. However, the inherited iterative sampling process …
performances in many generative tasks. However, the inherited iterative sampling process …
The power of generative ai: A review of requirements, models, input–output formats, evaluation metrics, and challenges
A Bandi, PVSR Adapa, YEVPK Kuchi - Future Internet, 2023 - mdpi.com
Generative artificial intelligence (AI) has emerged as a powerful technology with numerous
applications in various domains. There is a need to identify the requirements and evaluation …
applications in various domains. There is a need to identify the requirements and evaluation …
When large language models meet personalization: Perspectives of challenges and opportunities
The advent of large language models marks a revolutionary breakthrough in artificial
intelligence. With the unprecedented scale of training and model parameters, the capability …
intelligence. With the unprecedented scale of training and model parameters, the capability …
Generspeech: Towards style transfer for generalizable out-of-domain text-to-speech
Style transfer for out-of-domain (OOD) speech synthesis aims to generate speech samples
with unseen style (eg, speaker identity, emotion, and prosody) derived from an acoustic …
with unseen style (eg, speaker identity, emotion, and prosody) derived from an acoustic …
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias
Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …