Pam: Prompting audio-language models for audio quality assessment
While audio quality is a key performance metric for various audio processing tasks, including
generative modeling, its objective measurement remains a challenge. Audio-Language …
generative modeling, its objective measurement remains a challenge. Audio-Language …
DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization
Diffusion models have demonstrated significant potential in speech synthesis tasks,
including text-to-speech (TTS) and voice cloning. However, their iterative denoising …
including text-to-speech (TTS) and voice cloning. However, their iterative denoising …
[PDF][PDF] Exploring the Accuracy of Prosodic Encodings in State-of-the-Art Text-to-Speech Models
C Chan, J Kuang - Proc. SpeechProsody 2024, 2024 - isca-archive.org
Modern speech synthesis models have achieved increasingly humanlike outputs, and have
particularly been shown to be practically indistinguishable from natural speech at the phone …
particularly been shown to be practically indistinguishable from natural speech at the phone …
[PDF][PDF] Open-Source Multispeaker Text-to-Speech Model and Synthetic Speech Corpus with a Mexican Accent through a Web Spanish Dictionary
CDH Mena, JO Giraldo, IB de la Pena, A Medina… - isca-archive.org
Abstract Although European Spanish has abundant resources in the speech field, ASR
systems often struggle with Spanish of other world regions. Improving ASR accuracy can be …
systems often struggle with Spanish of other world regions. Improving ASR accuracy can be …