NSE-CATNet: deep neural speech enhancement using convolutional attention transformer network

N Saleem, TS Gunawan, M Kartiwi, BS Nugroho… - IEEE …, 2023 - ieeexplore.ieee.org
Speech enhancement (SE) is a critical aspect of various speech-processing applications.
Recent research in this field focuses on identifying effective ways to capture the long-term …

Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition

J Wang, Z Pan, M Zhang, RT Tan, H Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …

Multi-attention bottleneck for gated convolutional encoder-decoder-based speech enhancement

N Saleem, TS Gunawan, M Shafi, S Bourouis… - IEEE …, 2023 - ieeexplore.ieee.org
Convolutional encoder-decoder (CED) has emerged as a powerful architecture, particularly
in speech enhancement (SE), which aims to improve the intelligibility and quality and …

An empirical study on the impact of positional encoding in transformer-based monaural speech enhancement

Q Zhang, M Ge, H Zhu, E Ambikairajah… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Transformer architecture has enabled recent progress in speech enhancement. Since
Transformers are position-agostic, positional encoding is the de facto standard component …

Multi-stage progressive learning-based speech enhancement using time–frequency attentive squeezed temporal convolutional networks

C Jannu, SD Vanambathina - Circuits, Systems, and Signal Processing, 2023 - Springer
Speech enhancement is an important method for improving speech quality and intelligibility
in noisy environments. An effective speech enhancement model depends on precise …

Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection

C Fan, M Ding, J Tao, R Fu, J Yi… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Most research in synthetic speech detection (SSD) focuses on improving performance on
standard noise-free datasets. However, in actual situations, noise interference is usually …

Ripple sparse self-attention for monaural speech enhancement

Q Zhang, H Zhu, Q Song, X Qian… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
The use of Transformer represents a recent success in speech enhancement. However, as
its core component, self-attention suffers from quadratic complexity, which is computationally …

Mamba in Speech: Towards an Alternative to Self-Attention

X Zhang, Q Zhang, H Liu, T Xiao, X Qian… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformer and its derivatives have achieved success in diverse tasks across computer
vision, natural language processing, and speech processing. To reduce the complexity of …

A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement

BJ Borgström, MS Brandstein - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Neural network approaches to single-channel speech enhancement have received much
recent attention. In particular, mask-based architectures have achieved significant …

[PDF][PDF] Monaural speech separation method based on recurrent attention with parallel branches

X Yang, C Bao, X Zhang, X Chen - Proc. Interspeech, 2023 - drive.google.com
In many speech separation methods, the contextual information contained in the feature
sequence is mainly modeled by recurrent layer and/or self-attention mechanism. However …