Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models
In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style
diffusion and adversarial training with large speech language models (SLMs) to achieve …
diffusion and adversarial training with large speech language models (SLMs) to achieve …
Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …
are gaining increasing popularity. Although many of the works in the field of voice …
Disentanglement in a GAN for unconditional speech synthesis
Can we develop a model that can synthesize realistic speech directly from a latent space,
without explicit conditioning? Despite several efforts over the last decade, previous …
without explicit conditioning? Despite several efforts over the last decade, previous …
PVGAN: a pathological voice generation model incorporating a progressive nesting strategy
X Pan, T Feng, N Zhang - Journal of Voice, 2023 - Elsevier
The voice generation task is to solve the problem of limited samples in the voice dataset
using computer technology. By increasing the number of samples, the accuracy of voice …
using computer technology. By increasing the number of samples, the accuracy of voice …
Enhancing gan performance through neural architecture search and tensor decomposition
Generative Adversarial Networks (GANs) have emerged as a powerful tool for generating
high-fidelity content. This paper presents a new training procedure that leverages Neural …
high-fidelity content. This paper presents a new training procedure that leverages Neural …
A User-Guided Generation Framework for Personalized Music Synthesis Using Interactive Evolutionary Computation
The development of generative artificial intelligence (AI) has demonstrated notable
advancements in the domain of music synthesis. However, a perceived lack of creativity in …
advancements in the domain of music synthesis. However, a perceived lack of creativity in …
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Audio diffusion models can synthesize a wide variety of sounds. Existing models often
operate on the latent domain with cascaded phase recovery modules to reconstruct …
operate on the latent domain with cascaded phase recovery modules to reconstruct …
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
Voice conversion has gained increasing popularity within the field of audio manipulation
and speech synthesis. Often, the main objective is to transfer the input identity to that of a …
and speech synthesis. Often, the main objective is to transfer the input identity to that of a …
[PDF][PDF] Disentangled Representations in Speech Processing Applications
M Baas - 2024 - rf5.github.io
A central goal in systems that produce speech is to easily control high-level characteristics of
the speech while retaining naturalness. If we had such a system, it would enable a range of …
the speech while retaining naturalness. If we had such a system, it would enable a range of …
Exploring GANs With Conv-TasNet: Adversarial Training for Speech Separation
A Lakandri - 2024 - rave.ohiolink.edu
Abstract Generative Adversarial Networks (GANs) were initially developed for computer
vision tasks and have shown impressive capabilities in enhancing the performance of …
vision tasks and have shown impressive capabilities in enhancing the performance of …