Gan you hear me? reclaiming unconditional speech synthesis from diffusion models

Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models

YA Li, C Han, V Raghavan… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style
diffusion and adversarial training with large speech language models (SLMs) to achieve …

被引用次数：51 相关文章所有 6 个版本

[PDF] frontiersin.org

Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

AR Bargum, S Serafin, C Erkut - Frontiers in Signal Processing, 2024 - frontiersin.org

Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Disentanglement in a GAN for unconditional speech synthesis

M Baas, H Kamper - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org

Can we develop a model that can synthesize realistic speech directly from a latent space,
without explicit conditioning? Despite several efforts over the last decade, previous …

被引用次数：3 相关文章所有 4 个版本

PVGAN: a pathological voice generation model incorporating a progressive nesting strategy

X Pan, T Feng, N Zhang - Journal of Voice, 2023 - Elsevier

The voice generation task is to solve the problem of limited samples in the voice dataset
using computer technology. By increasing the number of samples, the accuracy of voice …

被引用次数：1 相关文章所有 4 个版本

[PDF] mahsamozaffari.com

Enhancing gan performance through neural architecture search and tensor decomposition

PR Pulakurthi, M Mozaffari, SA Dianat… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Generative Adversarial Networks (GANs) have emerged as a powerful tool for generating
high-fidelity content. This paper presents a new training procedure that leverages Neural …

被引用次数：2 相关文章所有 2 个版本

[PDF] human-competitive.org

A User-Guided Generation Framework for Personalized Music Synthesis Using Interactive Evolutionary Computation

Y Wang, Y Pei, Z Ma, J Li - Proceedings of the Genetic and Evolutionary …, 2024 - dl.acm.org

The development of generative artificial intelligence (AI) has demonstrated notable
advancements in the domain of music synthesis. However, a perceived lack of creativity in …

被引用次数：1 相关文章

[PDF] arxiv.org

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

G Zhu, Y Wen, MA Carbonneau, Z Duan - arXiv preprint arXiv:2311.08667, 2023 - arxiv.org

Audio diffusion models can synthesize a wide variety of sounds. Existing models often
operate on the latent domain with cascaded phase recovery modules to reconstruct …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

RAVE for Speech: Efficient Voice Conversion at High Sampling Rates

AR Bargum, S Lajboschitz, C Erkut - arXiv preprint arXiv:2408.16546, 2024 - arxiv.org

Voice conversion has gained increasing popularity within the field of audio manipulation
and speech synthesis. Often, the main objective is to transfer the input identity to that of a …

[PDF][PDF] Disentangled Representations in Speech Processing Applications

M Baas - 2024 - rf5.github.io

A central goal in systems that produce speech is to easily control high-level characteristics of
the speech while retaining naturalness. If we had such a system, it would enable a range of …

[PDF] ohiolink.edu

Exploring GANs With Conv-TasNet: Adversarial Training for Speech Separation

A Lakandri - 2024 - rave.ohiolink.edu

Abstract Generative Adversarial Networks (GANs) were initially developed for computer
vision tasks and have shown impressive capabilities in enhancing the performance of …