A hybrid continuity loss to reduce over-suppression for time-domain target speaker extraction

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …

被引用次数：44 相关文章所有 4 个版本

[PDF] ieee.org

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

X-sepformer: End-to-end speaker extraction network with explicit optimization on speaker confusion

K Liu, Z Du, X Wan, H Zhou - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Target speech extraction (TSE) systems are designed to extract target speech from a multi-
talker mixture. The popular training objective for most prior TSE networks is to enhance …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Rethinking the visual cues in audio-visual speaker extraction

J Li, M Ge, R Cao, L Wang, J Dang, S Zhang - arXiv preprint arXiv …, 2023 - arxiv.org

The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …

被引用次数：8 相关文章所有 5 个版本

Time-domain speech separation networks with graph encoding auxiliary

T Wang, Z Pan, M Ge, Z Yang… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org

End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Z Pan, G Wichern, Y Masuyama… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Target speech extraction aims to extract, based on a given conditioning cue, a target speech
signal that is corrupted by interfering sources, such as noise or competing speakers …

被引用次数：5 相关文章所有 6 个版本

[PDF] arxiv.org

CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single-and Multi-Channel Speaker Separation

VA Kalkhorani, DL Wang - arXiv preprint arXiv:2403.03411, 2024 - arxiv.org

We introduce CrossNet, a complex spectral mapping approach to speaker separation and
enhancement in reverberant and noisy conditions. The proposed architecture comprises an …

被引用次数：5 相关文章

[PDF] mdpi.com

An electroglottograph auxiliary neural network for target speaker extraction

L Chen, Z Mo, J Ren, C Cui, Q Zhao - Applied Sciences, 2022 - mdpi.com

The extraction of a target speaker from mixtures of different speakers has attracted extensive
amounts of attention and research. Previous studies have proposed several methods, such …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

ImagineNet: Target speaker extraction with intermittent visual cue through embedding inpainting

Z Pan, W Wang, M Borsdorf, H Li - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

The speaker extraction technique seeks to single out the voice of a target speaker from the
interfering voices in a speech mixture. Typically an auxiliary reference of the target speaker …

被引用次数：9 相关文章所有 3 个版本

TF-CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single-and Multi-Channel Speaker Separation

VA Kalkhorani, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

We introduce TF-CrossNet, a complex spectral mapping approach to speaker separation
and enhancement in reverberant and noisy conditions. The proposed architecture …