USEV: Universal speaker extraction with visual cue

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

X-sepformer: End-to-end speaker extraction network with explicit optimization on speaker confusion

K Liu, Z Du, X Wan, H Zhou - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Target speech extraction (TSE) systems are designed to extract target speech from a multi-
talker mixture. The popular training objective for most prior TSE networks is to enhance …

Rethinking the visual cues in audio-visual speaker extraction

J Li, M Ge, R Cao, L Wang, J Dang, S Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …

Time-domain speech separation networks with graph encoding auxiliary

T Wang, Z Pan, M Ge, Z Yang… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Z Pan, G Wichern, Y Masuyama… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Target speech extraction aims to extract, based on a given conditioning cue, a target speech
signal that is corrupted by interfering sources, such as noise or competing speakers …

CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single-and Multi-Channel Speaker Separation

VA Kalkhorani, DL Wang - arXiv preprint arXiv:2403.03411, 2024 - arxiv.org
We introduce CrossNet, a complex spectral mapping approach to speaker separation and
enhancement in reverberant and noisy conditions. The proposed architecture comprises an …

An electroglottograph auxiliary neural network for target speaker extraction

L Chen, Z Mo, J Ren, C Cui, Q Zhao - Applied Sciences, 2022 - mdpi.com
The extraction of a target speaker from mixtures of different speakers has attracted extensive
amounts of attention and research. Previous studies have proposed several methods, such …

ImagineNet: Target speaker extraction with intermittent visual cue through embedding inpainting

Z Pan, W Wang, M Borsdorf, H Li - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
The speaker extraction technique seeks to single out the voice of a target speaker from the
interfering voices in a speech mixture. Typically an auxiliary reference of the target speaker …

TF-CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single-and Multi-Channel Speaker Separation

VA Kalkhorani, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
We introduce TF-CrossNet, a complex spectral mapping approach to speaker separation
and enhancement in reverberant and noisy conditions. The proposed architecture …