Attention is all you need in speech separation
Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …
Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …
natural language processing and computer vision. They have achieved great success in …
Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement
For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …
enhancement systems usually suffer from performance bottleneck in extracting the target …
Glance and gaze: A collaborative learning framework for single-channel speech enhancement
The capability of the human to pay attention to both coarse and fine-grained regions has
been applied to computer vision tasks. Motivated by that, we propose a collaborative …
been applied to computer vision tasks. Motivated by that, we propose a collaborative …
TF-GridNet: Integrating full-and sub-band modeling for speech separation
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
TF-GridNet: Making time-frequency domain models great again for monaural speaker separation
We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-
frequency (TF) domain, for monaural talker-independent speaker separation in anechoic …
frequency (TF) domain, for monaural talker-independent speaker separation in anechoic …
Remixit: Continual self-training of speech enhancement models via bootstrapped remixing
We present RemixIT, a simple yet effective self-supervised method for training speech
enhancement without the need of a single isolated in-domain speech nor a noise waveform …
enhancement without the need of a single isolated in-domain speech nor a noise waveform …
An efficient encoder-decoder architecture with top-down attention for speech separation
Deep neural networks have shown excellent prospects in speech separation tasks.
However, obtaining good results while keeping a low model complexity remains challenging …
However, obtaining good results while keeping a low model complexity remains challenging …
Speech separation using an asynchronous fully recurrent convolutional neural network
Recent advances in the design of neural network architectures, in particular those
specialized in modeling sequences, have provided significant improvements in speech …
specialized in modeling sequences, have provided significant improvements in speech …
Mossformer: Pushing the performance limit of monaural speech separation using gated single-head transformer with convolution-augmented joint self-attentions
S Zhao, B Ma - … 2023-2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
Transformer based models have provided significant performance improvements in
monaural speech separation. However, there is still a performance gap compared to a …
monaural speech separation. However, there is still a performance gap compared to a …