Universal speech enhancement with score-based diffusion
Removing background noise from speech audio has been the subject of considerable effort,
especially in recent years due to the rise of virtual communication and amateur recordings …
especially in recent years due to the rise of virtual communication and amateur recordings …
HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
CDPAM: Contrastive learning for perceptual audio similarity
Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …
HiFi-GAN-2: Studio-quality speech enhancement via generative adversarial networks conditioned on acoustic features
Modern speech content creation tasks such as podcasts, video voice-overs, and audio
books require studio-quality audio with full bandwidth and balanced equalization (EQ) …
books require studio-quality audio with full bandwidth and balanced equalization (EQ) …
NORESQA: A framework for speech quality assessment using non-matching references
The perceptual task of speech quality assessment (SQA) is a challenging task for machines
to do. Objective SQA methods that rely on the availability of the corresponding clean …
to do. Objective SQA methods that rely on the availability of the corresponding clean …
Speech quality assessment through MOS using non-matching references
Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way
to assess the quality of speech signals. However, several recent attempts to automatically …
to assess the quality of speech signals. However, several recent attempts to automatically …
Acoustic matching by embedding impulse responses
The goal of acoustic matching is to transform an audio recording made in one acoustic
environment to sound as if it had been recorded in a different environment, based on …
environment to sound as if it had been recorded in a different environment, based on …
InQSS: a speech intelligibility and quality assessment model using a multi-task learning network
Speech intelligibility and quality assessment models are essential tools for researchers to
evaluate and improve speech processing models. However, only a few studies have …
evaluate and improve speech processing models. However, only a few studies have …
Causal Diffusion Models for Generalized Speech Enhancement
In this work, we present a causal speech enhancement system that is designed to handle
different types of corruptions. This paper is an extended version of our contribution to the …
different types of corruptions. This paper is an extended version of our contribution to the …
SQAPP: No-reference speech quality assessment via pairwise preference
Automatic speech quality assessment remains challenging, as we lack complete models of
human auditory perception. Many existing full-reference models correlate well with human …
human auditory perception. Many existing full-reference models correlate well with human …