A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Generative adversarial networks for speech processing: A review
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …
They are used as generative models for all kinds of data such as text, images, audio, music …
Real time speech enhancement in the waveform domain
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …
Conditional diffusion probabilistic model for speech enhancement
Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …
current systems still suffer from distorted and unnatural outputs. While generative models …
Metricgan+: An improved version of metricgan for speech enhancement
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …
and human auditory perception usually makes the quality of enhanced speech …
TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain
K Wang, B He, WP Zhu - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
In this paper, we propose a transformer-based architecture, called two-stage transformer
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …
Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement
For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …
enhancement systems usually suffer from performance bottleneck in extracting the target …
A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …
speech synthesis is an interesting direction. With the diffusion model as the most popular …
Cold diffusion for speech enhancement
Diffusion models have recently shown promising results for difficult enhancement tasks such
as the conditional and unconditional restoration of natural images and audio signals. In this …
as the conditional and unconditional restoration of natural images and audio signals. In this …
Glance and gaze: A collaborative learning framework for single-channel speech enhancement
The capability of the human to pay attention to both coarse and fine-grained regions has
been applied to computer vision tasks. Motivated by that, we propose a collaborative …
been applied to computer vision tasks. Motivated by that, we propose a collaborative …