Pam: Prompting audio-language models for audio quality assessment

S Deshmukh, D Alharthi, B Elizalde, H Gamper… - arXiv preprint arXiv …, 2024 - arxiv.org
While audio quality is a key performance metric for various audio processing tasks, including
generative modeling, its objective measurement remains a challenge. Audio-Language …

DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization

YA Li, R Kumar, Z Jin - arXiv preprint arXiv:2410.11097, 2024 - arxiv.org
Diffusion models have demonstrated significant potential in speech synthesis tasks,
including text-to-speech (TTS) and voice cloning. However, their iterative denoising …

[PDF][PDF] Exploring the Accuracy of Prosodic Encodings in State-of-the-Art Text-to-Speech Models

C Chan, J Kuang - Proc. SpeechProsody 2024, 2024 - isca-archive.org
Modern speech synthesis models have achieved increasingly humanlike outputs, and have
particularly been shown to be practically indistinguishable from natural speech at the phone …

[PDF][PDF] Open-Source Multispeaker Text-to-Speech Model and Synthetic Speech Corpus with a Mexican Accent through a Web Spanish Dictionary

CDH Mena, JO Giraldo, IB de la Pena, A Medina… - isca-archive.org
Abstract Although European Spanish has abundant resources in the speech field, ASR
systems often struggle with Spanish of other world regions. Improving ASR accuracy can be …