TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing...

[HTML][HTML] Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

S Leglaive, M Fraticelli, H ElGhazaly, L Borne… - Computer Speech & …, 2025 - Elsevier

Supervised models for speech enhancement are trained using artificially generated mixtures
of clean speech and noise signals. However, the synthetic training conditions may not …

被引用次数：1 相关文章所有 14 个版本

[PDF] arxiv.org

Uncertainty as a predictor: Leveraging self-supervised learning for zero-shot mos prediction

A Ravuri, E Cooper, J Yamagishi - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Predicting audio quality in voice synthesis and conversion systems is a critical yet
challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

L Zampierin, GB Hacene, B Nguyen… - arXiv preprint arXiv …, 2024 - arxiv.org

Self-supervised learning (SSL) has achieved remarkable success across various speech-
processing tasks. To enhance its efficiency, previous works often leverage the use of …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Less Peaky and More Accurate CTC Forced Alignment by Label Priors

R Huang, X Zhang, Z Ni, L Sun, M Hira… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Connectionist temporal classification (CTC) models are known to have peaky output
distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it …

Open-Source Conversational AI with SpeechBrain 1.0

M Ravanelli, T Parcollet, A Moumen… - arXiv preprint arXiv …, 2024 - arxiv.org

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused
particularly on speech processing tasks such as speech recognition, speech enhancement …

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

Y Yang, Z Song, J Zhuo, M Cui, J Li, B Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

The evolution of speech technology has been spurred by the rapid increase in dataset sizes.
Traditional speech models generally depend on a large amount of labeled training data …

[PDF] arxiv.org

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

C Yin, TS Chi, Y Tsao, HM Wang - arXiv preprint arXiv:2406.08445, 2024 - arxiv.org

Representations from pre-trained speech foundation models (SFMs) have shown impressive
performance in many downstream tasks. However, the potential benefits of incorporating pre …