Universal speech enhancement with score-based diffusion

J Serrà, S Pascual, J Pons, RO Araz… - arXiv preprint arXiv …, 2022 - arxiv.org
Removing background noise from speech audio has been the subject of considerable effort,
especially in recent years due to the rise of virtual communication and amateur recordings …

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z Jin, A Finkelstein - arXiv preprint arXiv:2006.05694, 2020 - arxiv.org
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

HiFi-GAN-2: Studio-quality speech enhancement via generative adversarial networks conditioned on acoustic features

J Su, Z Jin, A Finkelstein - … of Signal Processing to Audio and …, 2021 - ieeexplore.ieee.org
Modern speech content creation tasks such as podcasts, video voice-overs, and audio
books require studio-quality audio with full bandwidth and balanced equalization (EQ) …

NORESQA: A framework for speech quality assessment using non-matching references

P Manocha, B Xu, A Kumar - Advances in neural …, 2021 - proceedings.neurips.cc
The perceptual task of speech quality assessment (SQA) is a challenging task for machines
to do. Objective SQA methods that rely on the availability of the corresponding clean …

Speech quality assessment through MOS using non-matching references

P Manocha, A Kumar - arXiv preprint arXiv:2206.12285, 2022 - arxiv.org
Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way
to assess the quality of speech signals. However, several recent attempts to automatically …

Acoustic matching by embedding impulse responses

J Su, Z Jin, A Finkelstein - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
The goal of acoustic matching is to transform an audio recording made in one acoustic
environment to sound as if it had been recorded in a different environment, based on …

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

YW Chen, Y Tsao - arXiv preprint arXiv:2111.02585, 2021 - arxiv.org
Speech intelligibility and quality assessment models are essential tools for researchers to
evaluate and improve speech processing models. However, only a few studies have …

Causal Diffusion Models for Generalized Speech Enhancement

J Richter, S Welker, JM Lemercier, B Lay… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
In this work, we present a causal speech enhancement system that is designed to handle
different types of corruptions. This paper is an extended version of our contribution to the …

SQAPP: No-reference speech quality assessment via pairwise preference

P Manocha, Z Jin, A Finkelstein - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Automatic speech quality assessment remains challenging, as we lack complete models of
human auditory perception. Many existing full-reference models correlate well with human …