Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification
In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an
easy-to-implement, simple but effective backbone for automatic speaker verification based …
easy-to-implement, simple but effective backbone for automatic speaker verification based …
Overview of speaker modeling and its applications: From the lens of deep speaker representation learning
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …
By thoroughly and accurately modeling this information, it can be utilized in various …
Cam++: A fast and efficient network for speaker verification using context-aware masking
Time delay neural network (TDNN) has been proven to be efficient for speaker verification.
One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the …
One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the …
Self-supervised speaker recognition with loss-gated learning
In self-supervised learning for speaker recognition, pseudo labels are useful as the
supervision signals. It is a known fact that a speaker recognition model doesn't always …
supervision signals. It is a known fact that a speaker recognition model doesn't always …
Target active speaker detection with audio-visual cues
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
NeuroHeed: Neuro-steered speaker extraction using EEG signals
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …
competing voices and background noise, known as selective auditory attention. Recent …
Scoring of large-margin embeddings for speaker verification: Cosine or PLDA?
The emergence of large-margin softmax cross-entropy losses in training deep speaker
embedding neural networks has triggered a gradual shift from parametric back-ends to a …
embedding neural networks has triggered a gradual shift from parametric back-ends to a …
Frequency and multi-scale selective kernel attention for speaker verification
The majority of recent state-of-the-art speaker verification architectures adopt multi-scale
processing and frequency-channel attention mechanisms. Convolutional layers of these …
processing and frequency-channel attention mechanisms. Convolutional layers of these …
Speaker recognition with two-step multi-modal deep cleansing
Neural network-based speaker recognition has achieved significant improvement in recent
years. A robust speaker representation learns meaningful knowledge from both hard and …
years. A robust speaker representation learns meaningful knowledge from both hard and …