NSE-CATNet: deep neural speech enhancement using convolutional attention transformer network
Speech enhancement (SE) is a critical aspect of various speech-processing applications.
Recent research in this field focuses on identifying effective ways to capture the long-term …
Recent research in this field focuses on identifying effective ways to capture the long-term …
Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition
Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …
Multi-attention bottleneck for gated convolutional encoder-decoder-based speech enhancement
Convolutional encoder-decoder (CED) has emerged as a powerful architecture, particularly
in speech enhancement (SE), which aims to improve the intelligibility and quality and …
in speech enhancement (SE), which aims to improve the intelligibility and quality and …
An empirical study on the impact of positional encoding in transformer-based monaural speech enhancement
Transformer architecture has enabled recent progress in speech enhancement. Since
Transformers are position-agostic, positional encoding is the de facto standard component …
Transformers are position-agostic, positional encoding is the de facto standard component …
Multi-stage progressive learning-based speech enhancement using time–frequency attentive squeezed temporal convolutional networks
C Jannu, SD Vanambathina - Circuits, Systems, and Signal Processing, 2023 - Springer
Speech enhancement is an important method for improving speech quality and intelligibility
in noisy environments. An effective speech enhancement model depends on precise …
in noisy environments. An effective speech enhancement model depends on precise …
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection
Most research in synthetic speech detection (SSD) focuses on improving performance on
standard noise-free datasets. However, in actual situations, noise interference is usually …
standard noise-free datasets. However, in actual situations, noise interference is usually …
Ripple sparse self-attention for monaural speech enhancement
The use of Transformer represents a recent success in speech enhancement. However, as
its core component, self-attention suffers from quadratic complexity, which is computationally …
its core component, self-attention suffers from quadratic complexity, which is computationally …
Mamba in Speech: Towards an Alternative to Self-Attention
Transformer and its derivatives have achieved success in diverse tasks across computer
vision, natural language processing, and speech processing. To reduce the complexity of …
vision, natural language processing, and speech processing. To reduce the complexity of …
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement
BJ Borgström, MS Brandstein - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Neural network approaches to single-channel speech enhancement have received much
recent attention. In particular, mask-based architectures have achieved significant …
recent attention. In particular, mask-based architectures have achieved significant …
[PDF][PDF] Monaural speech separation method based on recurrent attention with parallel branches
X Yang, C Bao, X Zhang, X Chen - Proc. Interspeech, 2023 - drive.google.com
In many speech separation methods, the contextual information contained in the feature
sequence is mainly modeled by recurrent layer and/or self-attention mechanism. However …
sequence is mainly modeled by recurrent layer and/or self-attention mechanism. However …