Complex-valued neural networks: A comprehensive survey

CY Lee, H Hasegawa, S Gao - IEEE/CAA Journal of …, 2022 - ieeexplore.ieee.org
Complex-valued neural networks (CVNNs) have shown their excellent efficiency compared
to their real counter-parts in speech enhancement, image and signal processing …

Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement

Y Hu, Y Liu, S Lv, M Xing, S Zhang, Y Fu, J Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
Speech enhancement has benefited from the success of deep learning in terms of
intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods …

Icassp 2023 deep noise suppression challenge

H Dubey, A Aazami, V Gopal, B Naderi… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the
DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster …

Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement

K Tan, DL Wang - IEEE/ACM Transactions on Audio, Speech …, 2019 - ieeexplore.ieee.org
Phase is important for perceptual quality of speech. However, it seems intractable to directly
estimate phase spectra through supervised learning due to their lack of spectrotemporal …

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain

K Wang, B He, WP Zhu - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
In this paper, we propose a transformer-based architecture, called two-stage transformer
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …

Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement

SW Fu, CF Liao, Y Tsao, SD Lin - … Conference on Machine …, 2019 - proceedings.mlr.press
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to
directly optimize evaluation metrics of a target task, and thus, may not always guide the …

Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement

A Li, W Liu, C Zheng, C Fan, X Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …

Speech enhancement with score-based generative models in the complex STFT domain

S Welker, J Richter, T Gerkmann - arXiv preprint arXiv:2203.17004, 2022 - arxiv.org
Score-based generative models (SGMs) have recently shown impressive results for difficult
generative tasks such as the unconditional and conditional generation of natural images …