A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - arXiv preprint arXiv:2006.12847, 2020 - arxiv.org
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …

Conditional diffusion probabilistic model for speech enhancement

YJ Lu, ZQ Wang, S Watanabe… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain

K Wang, B He, WP Zhu - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
In this paper, we propose a transformer-based architecture, called two-stage transformer
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …

Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement

A Li, W Liu, C Zheng, C Fan, X Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

Cold diffusion for speech enhancement

H Yen, FG Germain, G Wichern… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Diffusion models have recently shown promising results for difficult enhancement tasks such
as the conditional and unconditional restoration of natural images and audio signals. In this …

Glance and gaze: A collaborative learning framework for single-channel speech enhancement

A Li, C Zheng, L Zhang, X Li - Applied Acoustics, 2022 - Elsevier
The capability of the human to pay attention to both coarse and fine-grained regions has
been applied to computer vision tasks. Motivated by that, we propose a collaborative …