Multi-modal multi-channel target speech separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：293 相关文章所有 6 个版本

[PDF] arxiv.org

Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org

Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

被引用次数：80 相关文章所有 5 个版本

[PDF] springer.com

Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer

Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

被引用次数：190 相关文章所有 12 个版本

[PDF] arxiv.org

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Z Zhang, Y Xu, M Yu, SX Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Speech separation algorithms are often used to separate the target speech from other
interfering sources. However, purely neural network based speech separation systems often …

被引用次数：143 相关文章所有 7 个版本

[PDF] thecvf.com

Reading to listen at the cocktail party: Multi-modal speech separation

A Rahimi, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The goal of this paper is speech separation and enhancement in multi-speaker and noisy
environments using a combination of different modalities. Previous works have shown good …

被引用次数：28 相关文章所有 8 个版本

[PDF] arxiv.org

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

Y Koizumi, S Karita, S Wisdom… - … IEEE Workshop on …, 2021 - ieeexplore.ieee.org

Single-channel speech enhancement (SE) is an important task in speech processing. A
widely used framework combines an anal-ysis/synthesis filterbank with a mask prediction …

被引用次数：52 相关文章所有 7 个版本

[PDF] arxiv.org

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z Jin, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Accurate recognition of cocktail party speech containing overlapping speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

被引用次数：13 相关文章所有 6 个版本

[PDF] arxiv.org

Complex neural spatial filter: Enhancing multi-channel target speech separation in complex domain

R Gu, SX Zhang, Y Zou, D Yu - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org

To date, mainstream target speech separation (TSS) approaches are formulated to estimate
the complex ratio mask (cRM) of target speech in time-frequency domain under supervised …

被引用次数：44 相关文章所有 5 个版本

X-tf-gridnet: A time–frequency domain target speaker extraction network with adaptive speaker embedding fusion

F Hao, X Li, C Zheng - Information Fusion, 2024 - Elsevier

Target speaker extraction (TSE) which has the capability to directly extract desired speech
given enrollment utterances of the target speaker has attracted more and more attention for …

被引用次数：7 相关文章

[PDF] arxiv.org

Generalized spatio-temporal RNN beamformer for target speech separation

Y Xu, Z Zhang, M Yu, SX Zhang, D Yu - arXiv preprint arXiv:2101.01280, 2021 - arxiv.org

Although the conventional mask-based minimum variance distortionless response (MVDR)
could reduce the non-linear distortion, the residual noise level of the MVDR separated …

被引用次数：47 相关文章所有 7 个版本