An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement

Y Hu, Y Liu, S Lv, M Xing, S Zhang, Y Fu, J Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
Speech enhancement has benefited from the success of deep learning in terms of
intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods …

Channel-attention dense u-net for multichannel speech enhancement

B Tolooshams, R Giri, AH Song, U Isik… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Supervised deep learning has gained significant attention for speech enhancement recently.
The state-of-the-art deep learning methods perform the task by learning a ratio/binary mask …

Beam-TasNet: Time-domain audio separation network meets frequency-domain beamformer

T Ochiai, M Delcroix, R Ikeshita… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Recent studies have shown that acoustic beamforming using a microphone array plays an
important role in the construction of high-performance automatic speech recognition (ASR) …

SpatialNet: Extensively learning spatial information for multichannel joint speech separation, denoising and dereverberation

C Quan, X Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This work proposes a neural network to extensively exploit spatial information for
multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In …

A comprehensive study of speech separation: spectrogram vs waveform separation

F Bahmaninezhad, J Wu, R Gu, SX Zhang, Y Xu… - arXiv preprint arXiv …, 2019 - arxiv.org
Speech separation has been studied widely for single-channel close-talk microphone
recordings over the past few years; developed solutions are mostly in frequency-domain …

Enhancing end-to-end multi-channel speech separation via spatial feature learning

R Gu, SX Zhang, L Chen, Y Xu, M Yu… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Hand-crafted spatial features (eg, inter-channel phase difference, IPD) play a fundamental
role in recent deep learning based multi-channel speech separation (MCSS) methods …

Demystifying TasNet: A dissecting approach

J Heitkaemper, D Jakobeit, C Boeddeker… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In recent years time domain speech separation has excelled over frequency domain
separation in single channel scenarios and noise-free environments. In this paper we …

Complex neural spatial filter: Enhancing multi-channel target speech separation in complex domain

R Gu, SX Zhang, Y Zou, D Yu - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org
To date, mainstream target speech separation (TSS) approaches are formulated to estimate
the complex ratio mask (cRM) of target speech in time-frequency domain under supervised …

Lavss: Location-guided audio-visual spatial audio separation

Y Ye, W Yang, Y Tian - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Existing machine learning research has achieved promising results in monaural audio-
visual separation (MAVS). However, most MAVS methods purely consider what the sound …