USEV: Universal speaker extraction with visual cue
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …
NeuroHeed: Neuro-steered speaker extraction using EEG signals
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …
competing voices and background noise, known as selective auditory attention. Recent …
X-sepformer: End-to-end speaker extraction network with explicit optimization on speaker confusion
K Liu, Z Du, X Wan, H Zhou - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Target speech extraction (TSE) systems are designed to extract target speech from a multi-
talker mixture. The popular training objective for most prior TSE networks is to enhance …
talker mixture. The popular training objective for most prior TSE networks is to enhance …
Rethinking the visual cues in audio-visual speaker extraction
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …
leverage two visual cues, namely speaker identity and synchronization, to enhance …
Time-domain speech separation networks with graph encoding auxiliary
End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …
Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Target speech extraction aims to extract, based on a given conditioning cue, a target speech
signal that is corrupted by interfering sources, such as noise or competing speakers …
signal that is corrupted by interfering sources, such as noise or competing speakers …
CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single-and Multi-Channel Speaker Separation
VA Kalkhorani, DL Wang - arXiv preprint arXiv:2403.03411, 2024 - arxiv.org
We introduce CrossNet, a complex spectral mapping approach to speaker separation and
enhancement in reverberant and noisy conditions. The proposed architecture comprises an …
enhancement in reverberant and noisy conditions. The proposed architecture comprises an …
An electroglottograph auxiliary neural network for target speaker extraction
L Chen, Z Mo, J Ren, C Cui, Q Zhao - Applied Sciences, 2022 - mdpi.com
The extraction of a target speaker from mixtures of different speakers has attracted extensive
amounts of attention and research. Previous studies have proposed several methods, such …
amounts of attention and research. Previous studies have proposed several methods, such …
ImagineNet: Target speaker extraction with intermittent visual cue through embedding inpainting
The speaker extraction technique seeks to single out the voice of a target speaker from the
interfering voices in a speech mixture. Typically an auxiliary reference of the target speaker …
interfering voices in a speech mixture. Typically an auxiliary reference of the target speaker …
TF-CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single-and Multi-Channel Speaker Separation
VA Kalkhorani, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
We introduce TF-CrossNet, a complex spectral mapping approach to speaker separation
and enhancement in reverberant and noisy conditions. The proposed architecture …
and enhancement in reverberant and noisy conditions. The proposed architecture …