Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification

Y Zhang, Z Lv, H Wu, S Zhang, P Hu, Z Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an
easy-to-implement, simple but effective backbone for automatic speaker verification based …

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Cam++: A fast and efficient network for speaker verification using context-aware masking

H Wang, S Zheng, Y Chen, L Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Time delay neural network (TDNN) has been proven to be efficient for speaker verification.
One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the …

Self-supervised speaker recognition with loss-gated learning

R Tao, KA Lee, RK Das… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
In self-supervised learning for speaker recognition, pseudo labels are useful as the
supervision signals. It is a known fact that a speaker recognition model doesn't always …

Target active speaker detection with audio-visual cues

Y Jiang, R Tao, Z Pan, H Li - arXiv preprint arXiv:2305.12831, 2023 - arxiv.org
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Scoring of large-margin embeddings for speaker verification: Cosine or PLDA?

Q Wang, KA Lee, T Liu - arXiv preprint arXiv:2204.03965, 2022 - arxiv.org
The emergence of large-margin softmax cross-entropy losses in training deep speaker
embedding neural networks has triggered a gradual shift from parametric back-ends to a …

Frequency and multi-scale selective kernel attention for speaker verification

SH Mun, J Jung, MH Han… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
The majority of recent state-of-the-art speaker verification architectures adopt multi-scale
processing and frequency-channel attention mechanisms. Convolutional layers of these …

Speaker recognition with two-step multi-modal deep cleansing

R Tao, KA Lee, Z Shi, H Li - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Neural network-based speaker recognition has achieved significant improvement in recent
years. A robust speaker representation learns meaningful knowledge from both hard and …