Overview of speaker modeling and its applications: From the lens of deep speaker representation learning
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …
By thoroughly and accurately modeling this information, it can be utilized in various …
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
Prompt-driven target speech diarization
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …
'when target event occurred'within an audio signal. We devise a neural architecture called …
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …
Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024
This work details our approach to achieving a leading system with a 1.79% pooled equal
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …
SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
It was shown that pre-trained models with self-supervised learning (SSL) techniques are
effective in various downstream speech tasks. However, most such models are trained on …
effective in various downstream speech tasks. However, most such models are trained on …
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
Speaker verification is hampered by background noise, particularly at extremely low Signal-
to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing …
to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing …
Deep attentive adaptive filter module in residual blocks for text-independent speaker verification
HB Kashani - Engineering Applications of Artificial Intelligence, 2024 - Elsevier
Text-independent speaker verification is a challenging research field in biometric user
authentication as a major application of artificial intelligence. It determines whether two …
authentication as a major application of artificial intelligence. It determines whether two …
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Uncertainty modeling in speaker representation aims to learn the variability present in
speech utterances. While the conventional cosine-scoring is computationally efficient and …
speech utterances. While the conventional cosine-scoring is computationally efficient and …
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
Partially manipulating a sentence can greatly change its meaning. Recent work shows that
countermeasures (CMs) trained on partially spoofed audio can effectively detect such …
countermeasures (CMs) trained on partially spoofed audio can effectively detect such …