Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models

YA Li, C Han, V Raghavan… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style
diffusion and adversarial training with large speech language models (SLMs) to achieve …

Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

AR Bargum, S Serafin, C Erkut - Frontiers in Signal Processing, 2024 - frontiersin.org
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …

Disentanglement in a GAN for unconditional speech synthesis

M Baas, H Kamper - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org
Can we develop a model that can synthesize realistic speech directly from a latent space,
without explicit conditioning? Despite several efforts over the last decade, previous …

PVGAN: a pathological voice generation model incorporating a progressive nesting strategy

X Pan, T Feng, N Zhang - Journal of Voice, 2023 - Elsevier
The voice generation task is to solve the problem of limited samples in the voice dataset
using computer technology. By increasing the number of samples, the accuracy of voice …

Enhancing gan performance through neural architecture search and tensor decomposition

PR Pulakurthi, M Mozaffari, SA Dianat… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Generative Adversarial Networks (GANs) have emerged as a powerful tool for generating
high-fidelity content. This paper presents a new training procedure that leverages Neural …

A User-Guided Generation Framework for Personalized Music Synthesis Using Interactive Evolutionary Computation

Y Wang, Y Pei, Z Ma, J Li - Proceedings of the Genetic and Evolutionary …, 2024 - dl.acm.org
The development of generative artificial intelligence (AI) has demonstrated notable
advancements in the domain of music synthesis. However, a perceived lack of creativity in …

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

G Zhu, Y Wen, MA Carbonneau, Z Duan - arXiv preprint arXiv:2311.08667, 2023 - arxiv.org
Audio diffusion models can synthesize a wide variety of sounds. Existing models often
operate on the latent domain with cascaded phase recovery modules to reconstruct …

RAVE for Speech: Efficient Voice Conversion at High Sampling Rates

AR Bargum, S Lajboschitz, C Erkut - arXiv preprint arXiv:2408.16546, 2024 - arxiv.org
Voice conversion has gained increasing popularity within the field of audio manipulation
and speech synthesis. Often, the main objective is to transfer the input identity to that of a …

[PDF][PDF] Disentangled Representations in Speech Processing Applications

M Baas - 2024 - rf5.github.io
A central goal in systems that produce speech is to easily control high-level characteristics of
the speech while retaining naturalness. If we had such a system, it would enable a range of …

Exploring GANs With Conv-TasNet: Adversarial Training for Speech Separation

A Lakandri - 2024 - rave.ohiolink.edu
Abstract Generative Adversarial Networks (GANs) were initially developed for computer
vision tasks and have shown impressive capabilities in enhancing the performance of …