Perceptually-motivated environment-specific speech enhancement

J Serrà, S Pascual, J Pons, RO Araz… - arXiv preprint arXiv …, 2022 - arxiv.org

Removing background noise from speech audio has been the subject of considerable effort,
especially in recent years due to the rise of virtual communication and amateur recordings …

被引用次数：83 相关文章所有 4 个版本

[PDF] arxiv.org

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z Jin, A Finkelstein - arXiv preprint arXiv:2006.05694, 2020 - arxiv.org

Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

被引用次数：163 相关文章所有 10 个版本

[PDF] arxiv.org

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

被引用次数：70 相关文章所有 5 个版本

[PDF] princeton.edu

HiFi-GAN-2: Studio-quality speech enhancement via generative adversarial networks conditioned on acoustic features

J Su, Z Jin, A Finkelstein - … of Signal Processing to Audio and …, 2021 - ieeexplore.ieee.org

Modern speech content creation tasks such as podcasts, video voice-overs, and audio
books require studio-quality audio with full bandwidth and balanced equalization (EQ) …

被引用次数：53 相关文章所有 5 个版本

[PDF] neurips.cc

NORESQA: A framework for speech quality assessment using non-matching references

P Manocha, B Xu, A Kumar - Advances in neural …, 2021 - proceedings.neurips.cc

The perceptual task of speech quality assessment (SQA) is a challenging task for machines
to do. Objective SQA methods that rely on the availability of the corresponding clean …

被引用次数：40 相关文章所有 8 个版本

[PDF] arxiv.org

Speech quality assessment through MOS using non-matching references

P Manocha, A Kumar - arXiv preprint arXiv:2206.12285, 2022 - arxiv.org

Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way
to assess the quality of speech signals. However, several recent attempts to automatically …

被引用次数：23 相关文章所有 5 个版本

[PDF] princeton.edu

Acoustic matching by embedding impulse responses

J Su, Z Jin, A Finkelstein - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

The goal of acoustic matching is to transform an audio recording made in one acoustic
environment to sound as if it had been recorded in a different environment, based on …

被引用次数：33 相关文章所有 7 个版本

[PDF] arxiv.org

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

YW Chen, Y Tsao - arXiv preprint arXiv:2111.02585, 2021 - arxiv.org

Speech intelligibility and quality assessment models are essential tools for researchers to
evaluate and improve speech processing models. However, only a few studies have …

被引用次数：18 相关文章所有 6 个版本

[PDF] ieee.org

Causal Diffusion Models for Generalized Speech Enhancement

J Richter, S Welker, JM Lemercier, B Lay… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

In this work, we present a causal speech enhancement system that is designed to handle
different types of corruptions. This paper is an extended version of our contribution to the …

被引用次数：4 相关文章

SQAPP: No-reference speech quality assessment via pairwise preference

P Manocha, Z Jin, A Finkelstein - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Automatic speech quality assessment remains challenging, as we lack complete models of
human auditory perception. Many existing full-reference models correlate well with human …

被引用次数：7 相关文章所有 3 个版本