Metricgan: Generative adversarial networks based black-box metric scores optimization for...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：118 相关文章所有 6 个版本

Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier

Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

被引用次数：51 相关文章所有 2 个版本

[PDF] arxiv.org

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - arXiv preprint arXiv:2006.12847, 2020 - arxiv.org

We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …

被引用次数：450 相关文章所有 8 个版本

[PDF] arxiv.org

Conditional diffusion probabilistic model for speech enhancement

YJ Lu, ZQ Wang, S Watanabe… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …

被引用次数：134 相关文章所有 7 个版本

[PDF] arxiv.org

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org

The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

被引用次数：201 相关文章所有 9 个版本

[PDF] arxiv.org

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain

K Wang, B He, WP Zhu - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org

In this paper, we propose a transformer-based architecture, called two-stage transformer
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …

被引用次数：161 相关文章所有 3 个版本

[PDF] github.io

Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement

A Li, W Liu, C Zheng, C Fan, X Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …

被引用次数：136 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

被引用次数：53 相关文章所有 4 个版本

[PDF] arxiv.org

Cold diffusion for speech enhancement

H Yen, FG Germain, G Wichern… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Diffusion models have recently shown promising results for difficult enhancement tasks such
as the conditional and unconditional restoration of natural images and audio signals. In this …

被引用次数：79 相关文章所有 9 个版本

[PDF] arxiv.org

Glance and gaze: A collaborative learning framework for single-channel speech enhancement

A Li, C Zheng, L Zhang, X Li - Applied Acoustics, 2022 - Elsevier

The capability of the human to pay attention to both coarse and fine-grained regions has
been applied to computer vision tasks. Motivated by that, we propose a collaborative …

被引用次数：112 相关文章所有 3 个版本