Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features

RE Zezario, SW Fu, F Chen, CS Fuh… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
This study proposes a cross-domain multi-objective speech assessment model, called
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …

Self-supervised visual acoustic matching

A Somayazulu, C Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a
target acoustic environment. Existing methods assume access to paired training data, where …

Unsupervised speech enhancement using dynamical variational autoencoders

X Bie, S Leglaive, X Alameda-Pineda… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with
latent variables, dedicated to model time series of high-dimensional data. DVAEs can be …

Tea-pse 3.0: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2023 dns-challenge

Y Ju, J Chen, S Zhang, S He, W Rao… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise
Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded …

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

SW Fu, KH Hung, Y Tsao, YCF Wang - arXiv preprint arXiv:2402.16321, 2024 - arxiv.org
Speech quality estimation has recently undergone a paradigm shift from human-hearing
expert designs to machine-learning models. However, current models rely mainly on …

Integrating uncertainty into neural network-based speech enhancement

H Fang, D Becker, S Wermter… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
Supervised masking approaches in the time-frequency domain aim to employ deep neural
networks to estimate a multiplicative mask to extract clean speech. This leads to a single …

USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering

ZQ Wang - arXiv preprint arXiv:2402.00820, 2024 - arxiv.org
In reverberant conditions with a single speaker, each far-field microphone records a
reverberant version of the same speaker signal at a different location. In over-determined …

Hd-demucs: General speech restoration with heterogeneous decoders

D Kim, SW Chung, H Han, Y Ji, HG Kang - arXiv preprint arXiv:2306.01411, 2023 - arxiv.org
This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS,
demonstrating efficacy across multiple distortion environments. Unlike conventional …