[HTML][HTML] Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge
S Leglaive, M Fraticelli, H ElGhazaly, L Borne… - Computer Speech & …, 2025 - Elsevier
Supervised models for speech enhancement are trained using artificially generated mixtures
of clean speech and noise signals. However, the synthetic training conditions may not …
of clean speech and noise signals. However, the synthetic training conditions may not …
Uncertainty as a predictor: Leveraging self-supervised learning for zero-shot mos prediction
A Ravuri, E Cooper, J Yamagishi - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Predicting audio quality in voice synthesis and conversion systems is a critical yet
challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are …
challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are …
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Self-supervised learning (SSL) has achieved remarkable success across various speech-
processing tasks. To enhance its efficiency, previous works often leverage the use of …
processing tasks. To enhance its efficiency, previous works often leverage the use of …
Less Peaky and More Accurate CTC Forced Alignment by Label Priors
Connectionist temporal classification (CTC) models are known to have peaky output
distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it …
distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it …
Open-Source Conversational AI with SpeechBrain 1.0
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused
particularly on speech processing tasks such as speech recognition, speech enhancement …
particularly on speech processing tasks such as speech recognition, speech enhancement …
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
The evolution of speech technology has been spurred by the rapid increase in dataset sizes.
Traditional speech models generally depend on a large amount of labeled training data …
Traditional speech models generally depend on a large amount of labeled training data …
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Representations from pre-trained speech foundation models (SFMs) have shown impressive
performance in many downstream tasks. However, the potential benefits of incorporating pre …
performance in many downstream tasks. However, the potential benefits of incorporating pre …