Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Prompt-driven target speech diarization

Y Jiang, Z Chen, R Tao, L Deng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization

R Tao, Z Shi, Y Jiang, DT Truong, ES Chng… - Proceedings of the …, 2024 - dl.acm.org
The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …

Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024

A Guragain, T Liu, Z Pan, HB Sailor, Q Wang - arXiv preprint arXiv …, 2024 - arxiv.org
This work details our approach to achieving a leading system with a 1.79% pooled equal
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

J Lin, M Ge, J Ao, L Deng, H Li - arXiv preprint arXiv:2407.02826, 2024 - arxiv.org
It was shown that pre-trained models with self-supervised learning (SSL) techniques are
effective in various downstream speech tasks. However, most such models are trained on …

Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio

Y Ma, KA Lee, V Hautamäki, M Ge… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Speaker verification is hampered by background noise, particularly at extremely low Signal-
to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing …

Deep attentive adaptive filter module in residual blocks for text-independent speaker verification

HB Kashani - Engineering Applications of Artificial Intelligence, 2024 - Elsevier
Text-independent speaker verification is a challenging research field in biometric user
authentication as a major application of artificial intelligence. It determines whether two …

Cosine Scoring with Uncertainty for Neural Speaker Embedding

Q Wang, KA Lee - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org
Uncertainty modeling in speaker representation aims to learn the variability present in
speech utterances. While the conventional cosine-scoring is computationally efficient and …

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

T Liu, L Zhang, RK Das, Y Ma, R Tao, H Li - arXiv preprint arXiv …, 2024 - arxiv.org
Partially manipulating a sentence can greatly change its meaning. Recent work shows that
countermeasures (CMs) trained on partially spoofed audio can effectively detect such …